Abstract
Neural radiance fields (NeRFs) have revolutionized novel view synthesis, leading to an unprecedented level of realism in rendered images. However, the reconstruction quality of NeRFs suffers significantly from out-of-focus regions in the input images. We propose NeRF-FF, a plug-in strategy that estimates image masks based on Focus Frustums (FFs), i.e., the visible volume in the scene space that is in-focus. NeRF-FF enables a subsequently trained NeRF model to omit out-of-focus image regions during the training process. Existing methods to mitigate the effects of defocus blurred input images often leverage dynamic ray generation. This makes them incompatible with the static ray assumptions employed by runtime-performance-optimized NeRF variants, such as Instant-NGP, leading to high training times. Our experiments show that NeRF-FF outperforms state-of-the-art approaches regarding training time by two orders of magnitude—reducing it to under 1 min on end-consumer hardware—while maintaining comparable visual quality.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The reconstruction and subsequent novel view synthesis based on 2D input images of complex 3D scenes is a long-standing research problem in computer vision. Recently, the introduction of neural radiance fields (NeRFs) [30] has led to enormous progress regarding the rendering of photorealistic views from multi-view images. NeRFs predict radiance and volume density from a 3D spatial location and viewing direction using a multilayer perceptron (MLP). The outputs of this MLP can be rendered by established volume rendering techniques [15].
Most NeRF variants are restricted to well-defined capturing environments, with noise-free images of a static scene with known camera parameters. Therefore, most NeRF variants produce inferior rendering results when the input images suffer from different types of blur. The most prominent examples are motion blur, where the camera changes position during exposure time leading to superposition of multiple views, and defocus blur that is caused by a suboptimal choice of camera aperture. There are multiple approaches that address the issue of motion blur [8, 21, 40, 53], defocus blur [59] or both [20, 26, 37] in the context of NeRFs. However, while there is previous work that has shown great progress in accelerating NeRF training and rendering times [4, 10, 31, 48], most of the existing blur-resistant contributions do not leverage the potential of these accelerations. This can be attributed to the fact that most of the blur-resistant strategies rely on dynamically generated rays, while the accelerated NeRF methods leverage explicit density volume representations to prune low density voxels. These strategies are based on the assumption of static rays.
In this paper, we propose NeRF-FF, a novel plug-in method that can be combined with a multitude of NeRF and NeRF-like novel view synthesis strategies, which significantly improves the rendering quality based on blurred input images. NeRF-FF calculates image masks that contain in-focus or only slightly out-of-focus regions of the input images that lead to a beneficial trade-off between image quality and scene coverage. To achieve this, NeRF-FF first identifies regions that are in-focus by applying the discrete Laplace operator, which detects sharp edges that are predominantly prevalent in in-focus regions. These feature points are leveraged to estimate the visible and in-focus volume for each image in the 3D scene space using depth maps from a NeRF, which was trained before. This volume has the shape of a pyramid cropped by a near and far plane. We refer to it as Focus Frustum.
These Focus Frustums are then refined by expanding their near and far plane such that each point in the relevant scene geometry is part of at least a certain percentage of the Focus Frustums. The projection of dense scene points encompassed by the Focus Frustums into the respective image space provides image masks indicating the pixels, a subsequent NeRF model can be trained on, to obtain unblurred novel views. Thus, we provide a trade-off between the reconstruction of the complete scene geometry and usage of optimal, i.e., in-focus or only slightly out-of-focus, views.
Our experiments show that the resulting image quality of NeRF-FF is comparable to state-of-the-art strategies. At the same time, NeRF-FF can be used in combination with accelerated NeRF variants like Instant-NGP [31], which is not easily integrable with state-of-the-art methods like PDRF [37]. The proposed combination of NeRF-FF and Instant-NGP [31] outperforms the state of the art by two orders of magnitude regarding training time, bringing it down to under 1 min (see Fig. 1).
In summary, this paper offers the following contributions:
-
We propose NeRF-FF, a plug-in strategy that estimates image masks based on Focus Frustums (FFs), i.e., the visible volume in scene space that is in-focus, that allow a subsequent NeRF model to omit out-of-focus image regions during training, including a reasonable choice of hyperparameters.
-
We provide a quantitative analysis showing that the combination of NeRF-FF with Instant-NGP [31] yields resulting novel views that are comparable to the state of the art in quality while reducing the required computation time by two orders of magnitude. This results in end-to-end runtimes under 1 min on end-consumer hardware.
NeRF-FF in combination with Instant-NGP (NeRF-FF + iNGP) accelerates training times on blurry input images in comparison with the state of the art by two orders of magnitude, while significantly improving Instant-NGP’s novel view synthesis capabilities on these inputs. Results obtained from defocus real dataset by Ma et al. [26] using an RTX 3090
2 Related work
Neural Radiance Fields (NeRFs). NeRFs [30] model the radiance of a static 3D scene as implicit representation, which can be used by classical volume rendering techniques [15] to perform photorealistic novel view synthesis—the rendering of an unknown viewing direction based on a set of input images. This approach has gained popularity in recent years, leading to many follow-up works that have extended NeRF capturing capabilities. There are NeRF variants that can handle dynamic scenes [22, 35, 3.5). The pipeline of NeRF-FF is illustrated in Fig. 2.
3.1 Initial depth estimates
Defocus blur manifests depending on the distance of surfaces from the camera plane. In order to distinguish in-focus regions from out-of-focus regions, our approach leverages spatial information from the scene space. Therefore, a map** between points in 2D image space and their corresponding points in scene space is required. We train a preliminary Instant-NGP [31] instance on the blurry input images i with the color information \(I_i \in [0, 255]^{3 \times W \times H}\), to obtain per view depth maps \(D_i \in \mathbb {R}^{W \times H}\). We notate the color information of a pixel that is part of image i and has position (x, y) as \(I_i(x,y)\) and call that pixel’s depth \(D_i(x,y)\). The depth maps \(D_i \in \mathbb {R}^{W \times H}\), that are later used for initial Focus Frustum estimation (Fig. 2b) and mask generation (Fig. 2d), are derived from the rays that are deployed during the volume rendering process of Instant-NGP inference. Even though the reconstructed appearance of the preliminary Instant-NGP model suffers from the blurred image inputs, the estimated scene geometry is adequate for the purpose of generating sufficiently accurate depth maps. Note, that any approach which reliably estimates a dense depth map from the blurry input images can be applied in this stage.
3.2 Discrete Laplace operator as DoF indicator
In-focus image regions generally contain sharp edges, be it through textured surfaces or object boundaries, which are greatly reduced by defocus blur [3, 16]. These edges can be detected using the discrete Laplace operator \(\Delta \). Pixels \(I_i(x,y)\), that exhibit high values after applying the discrete Laplace operator on grayscaled input images, indicate sharp edges and are therefore used as indicators for in-focus image regions—modeled as a set of indicator pixels \(P_i\) per image \(I_i\) with
where \(\Delta (I_i)(x,y)\) denotes the value of a pixel \(I_i(x,y)\) after applying the discrete Laplace to the grayscaled version of \(I_i\) and \(\tau \) denotes a threshold for the results of the discrete Laplace operator that functions as a hyperparameter for NeRF-FF. A suitable value range for \(\tau \) is experimentally identified in Sect. 4.2.
3.3 Initial Focus Frustum estimation
Based on the focus indicator pixels \(P_i\) and the corresponding depth maps \(D_i\), we can compute the near and far planes that are parallel to the image plane and in between which the in-focus regions are located. We call the volume that lies between these planes in the respective camera view frustum Focus Frustum. The Focus Frustum represents the visible in-focus volume in the scene space for the corresponding image. Sharp edges often occur on object boundaries, which exhibit unstable behavior regarding their associated depth, depending on whether the depth of the foreground or background object is considered. These regions are vulnerable to even slight inaccuracies in the geometry estimation of the preliminary NeRF model. Therefore, it is likely that the sets of focus indicators \(P_i\) contain pixels that correspond to out-of-focus regions.
To mitigate the influence of these erroneous focus indicators, we compute the depths of the near and far planes \(n_i, f_i\) of the Focus Frustum \( FF _i\) based on the depth distribution of the focus indicator pixels:
where M is the median of the depth values of the focus indicator points in \(P_i\) according to the depth map \(D_i\), \(\sigma _\mu (X)\) denotes the standard deviation of the elements in set X in relation to \(\mu \), and \(P_i^<\) (\(P_i^>\)) is the subset of elements in \(P_i\) with a corresponding depth smaller (greater) than M. This follows the intuition that the depth of in-focus points in images with defocus blur follows a normal distribution. The probability of a point being in-focus is highest in the center of the distribution and decreases for closer and further points.
Based on these near and far planes, our initial Focus Frustums \( FF _i\) are defined as
where \(\textbf{v}_i\) denotes the forward facing vector of the camera pose that captured image i in world coordinates, \(K_i\) denotes the internal and \([R_i|t_i]\) denotes the external camera parameters of that camera. The first term of Eq. 4 signifies the depth of the corresponding volume lying between the near and far plane defined by \(n_i, f_i\), while the second term defines the points’ localization within the view frustum, i.e., visible volume in scene space, for the camera view corresponding to image i.
Figure 2b illustrates the process of estimating Focus Frustums based on the focus indicator points \(P_i\), the depth maps \(D_i\) and the corresponding camera views.
3.4 Focus Frustum refinement
Our approach provides image masks that regulate which image regions a subsequent NeRF model is trained on. The Focus Frustums dictate which depth ranges are considered for the generation of these masks by omitting pixels that get their color mainly from spatial positions outside of the Focus Frustum. Therefore, a low coverage of the scene geometry by the Focus Frustums leads to sparse inputs for these regions. Most NeRF variants suffer from significant quality loss for such sparse inputs. A low coverage of the scene volume can be caused by an insufficient amount of Laplacian features in in-focus image regions, e.g., due to textureless surfaces, or spatial positions that are in-focus in only a few or no images. To mitigate this issue, we iteratively refine the Focus Frustums by expanding their DoF, i.e., the distance between their near and far plane. For a set \( S \) of randomly sampled points in scene space with high optical density in the preliminary NeRF—indicating this scene space is occupied by relevant scene geometry—we examine the amount of Focus Frustums they are encompassed by (see Eq. 4). We refine the depths \(n_i, f_i\) of the near and far plane for each Focus Frustum \( FF _i\). We notate the updated near and far plane of the resulting Focus Frustums \( RFF _{i}\) as \(\hat{ n }_i, \hat{ f }_i\). The refined near and far planes’ depths are chosen, such that the accumulated change in depth R (Eq. 5) is minimal, while any spatial position \(\textbf{p} \in S \) is part of at least a certain fraction \(\varrho \) of the refined Focus Frustums.
\(\varrho \) is a hyperparameter of NeRF-FF that trades off scene coverage against reconstruction quality of regions that are depicted in the Focus Frustums. We show the influence of different values of \(\varrho \) in Sect. 4.2.
For the calculation of R, we only consider Focus Frustums with a corresponding view frustum that encompasses the respective spatial position \(\textbf{p}\), i.e., only Focus Frustums are considered that could potentially encompass \(\textbf{p}\) if an arbitrarily large change of the near or far plane is performed.
Figure 2c illustrates the refined Focus Frustums obtained in the previous pipeline step. They were expanded to encompass relevant scene geometry that was not sufficiently captured by the initial estimates.
3.5 Mask generation
Given the refined Focus Frustums \( RFF _i\), we calculate corresponding image masks \(\textit{Mask}_i\) with addressable pixels \(\textit{Mask}_i(x,y)\) at positions (x, y) that discriminate positions in image space depicting structures within the Focus Frustum against ones that depict structures on the outside. To generate these masks, we leverage the depth values from the depth maps \(D_i\) (Sect. 3.1). A pixel is activated in an image mask \( Mask _i\) if its corresponding depth value is part of the refined Focus Frustum’s focus range \([\hat{n}_i, \hat{f}_i]\):
Note that since all pixels in the depth map are visible in their corresponding image, it is not necessary to consider the other boundaries of the Focus Frustum. A subsequent NeRF model is trained on the blurry input images only considering pixels \(I_i(x,y)\) with \(\textit{Mask}_i(x,y)=1\). This is illustrated in Fig. 2e. Due to the general nature of the results of NeRF-FF in the form of these masks, NeRF-FF is compatible with most NeRF variants.
4 Experiments
4.1 Implementation details
Training. NeRF-FF provides image masks for a subsequent training process with a NeRF variant. In our experiments, we evaluate NeRF-FF in combination with Instant-NGP (NeRF-FF + iNGP). Therefore, we employ Instant-NGP [31] as a preliminary and a subsequent NeRF model. Note that current state-of-the-art methods are not easily usable in combination with Instant-NGP because Instant-NGP employs custom CUDA kernels, which are hard to integrate with existing strategies [37]. The preliminary NeRF is trained for 2,000 iterations and the subsequent NeRF for 4,000 iterations on a single NVIDIA RTX3090. We use the standard Instant-NGP training parameters.
Datasets and evaluation.
We perform our experiments on the dataset introduced by Ma et al. [26], which provides 5 synthetic and 10 real-world scenes containing images suffering from defocus blur. Furthermore, we present the capabilities of NeRF-FF on the dataset presented by Wu et al. [59], which contains 6 real-world scenes.
For the synthetic scenes of Ma et al. [26], we train one model per scene on the training split. We compare the synthesized novel views for the views in the evaluation splits that are rendered using that model with the respective ground truth image. For our analysis, we consider the median of the results per scene. Each real-world scene of Ma et al. [26] contains a set of in-focus reference images. For each provided reference image, we train a model on any image in the respective scene dataset except the examined reference image, comparing the synthesized image to the reference. For each scene, we report the median of these results in Table 1. The average results are reported in Table 3.
For the real-world scenes of Wu et al. [59], a triplet of images consisting of two out-of-focus and one in-focus image is provided per view. Analogous to the evaluation method of Wu et al., we train a model on the out-of-focus images from \(\frac{8}{9}\) of the views and compare the synthesized images of the remaining \(\frac{1}{9}\) of scenes to their respective in-focus reference image. For each scene, we evaluate different splits until the union of these splits’ evaluation sets contains at least 20 images. Per scene we report the median of these results in Table 2.
4.2 Hyperparameter optimization and ablation
Our approach relies on a set of hyperparameters (\(\tau \), \(\varrho \)). The threshold \(\tau \) determines which values, that result from the application of the discrete Laplace operator, are considered indicators for in-focus regions (see Sect. 3.2). \(\varrho \) indicates the minimum fraction of Focus Frustums each randomly sampled scene point should be encompassed by after the FF refinement step (Sect. 3.4). This hyperparameter defines the trade-off between the sharpness of the image regions that are used for training and coverage of the scene.
In this section, we identify reasonable values for these parameters by comparing resulting images regarding their visual quality measured by the peak signal-to-noise ratio (PSNR) and the structural similarity index (SSIM) [66]. The experiments are conducted on the blurred scenes from the real dataset by Ma et al. [26].
To identify a suitable value for \(\varrho \), we consider values \(\varrho \in [0, 0.1, \ldots , 1.0]\). Note that \(\varrho =0\) corresponds to omission of the Focus Frustum Refinement step and \(\varrho =1\) is equal to the training of the subsequent NeRF model without applying NeRF-FF beforehand. For this experiment, we set \(\tau =80\), which constitutes an educated guess that turns out to be optimal for the dataset at hand. Figure 3 shows the results of that evaluation, which indicate that \(\varrho =0.4\) results in best performance for PSNR and SSIM, while significantly lower or higher values for \(\varrho \) lead to dramatically decreased performance. Therefore, we chose \(\varrho =0.4\) for our subsequent experiments.
Qualitative comparison of NeRF-FF + iNGP (ours) with the current state of the art on the real-world defocus dataset of Ma et al. [26]. The depicted scenes are cake, caps, cupcake, cups and daisy (left to right)
Figure 3 also shows the results for different values of \(\tau \), which determines the threshold for the results of the discrete Laplace operator, employed for identifying DoF indicator pixels. The experiment is conducted with \(\varrho = 0.4\), since we have established it as an optimal parameter. The results indicate that low values of \(\tau \) lead to bad performance. In these cases, even weak signals are considered indicators for in-focus regions, leading to feature points in regions that are essentially out-of-focus but exhibit strong texture. On the other hand, high values of \(\tau \) lead to slowly decreasing performance due to the fact that even some sharp edges in in-focus regions are not considered relevant. The decline for higher values of \(\tau \), however, is small in comparison with low values of \(\tau \). This is most likely attributed to the fact that the indicator points are only used for the initial Focus Frustum estimates, which are then further refined. Smaller initial Focus Frustums, which occur for high values of \(\tau \), are then alleviated by the subsequent Focus Frustum refinement. Following the results of this experiment, the value of \(\tau =80\) is identified as the optimal choice for subsequent experiments, as it maximizes PSNR and SSIM values.
4.3 Novel view synthesis from blurry inputs
Evaluation Methodology. In this section, we summarize the results of our proposed approach in combination with Instant-NGP (NeRF-FF + iNGP) on the synthetic and real-world dataset of Ma et al. [26] and the dataset proposed by Wu et al. [59]. Similar to previous works, we conduct our quantitative analysis of the examined approaches by comparing the rendered images to their respective reference image regarding their PSNR, SSIM, and LPIPS.
Real-World Comparison.
Table 3 and Figs. 4 and 5 show that our model produces results that are comparable to SOTA methods. We compare our approach to Deblur-NeRF [26], DP-NeRF [20] and PDRF [37] that leverage 3D scene information for the estimation of a blur kernel while simultaneously incorporating dynamic ray generation. Ma et al. [26] show that these methods significantly outperform the process of deblurring single images beforehand by using strategies like KPAC [45]. Our results show that NeRF-FF + iNGP leads to images with PSNR values that are on par with state-of-the-art work. Regarding SSIM, our approach consistently outperforms these methods by a fair margin. The results indicate that Instant-NGP outperforms these methods regarding SSIM as well, suggesting that these improvements in structural similarity can be attributed to the usage of that NeRF variant. However, our approach trails the visual quality of the reference deblurring strategies regarding its LPIPS score by a slight margin. While this indicates inferior quality regarding human perception, we consider this difference in rendering quality negligible in comparison with the achieved acceleration. Our approach only requires 45 s of training time on average, which constitutes an acceleration of two orders of magnitude in comparison with state-of-the-art methods. This training time encompasses both the execution of NeRF-FF, including the preliminary NeRF training, as well as the training of the Instant-NGP model.
The results of our experiment on the dataset by Wu et al. [59] are illustrated in Table 2 and Fig. 6. These results indicate that we consistently produce images of better quality than DoF-NeRF [59], which physically models defocus blur by estimating the camera optics—especially the lens—as a multilayer perceptron.
Synthetic Comparison. Table 3 shows the results of NeRF-FF + iNGP on the synthetic dataset of Ma et al. [26]. On this dataset, our approach underperforms in comparison with the state-of-the-art approaches. An analysis of the intermediary pipeline stages shows that this is caused by a low-quality dense reconstruction of the synthetic scenes leading to depth maps of inferior quality. This is caused by fully blurred images in the dataset. We discuss this issue further in Sect. 5.
Comparison of average PSNR, SSIM, LPIPS and training times between NeRF + iNGP and state-of-the-art methods on the real defocus dataset by Ma et al. [26]. The results indicate that NeRF + iNGP achieves comparable visual quality while decreasing the required training time by at least two orders of magnitude. For \(\uparrow \) (\(\downarrow \)), higher (lower) values indicate better results
4.4 Compatibility with other technologies
Table 4 illustrates the results of NeRF-FF when applied in combination with different volumetric novel view synthesis strategies. For these experiments, we employed NeRF-FF in combination with nerfacto—a NeRF variant—and splatfacto, which is based on 3D Gaussian Splatting [18]. These strategies are both implemented in Nerfstudio [50].
The applied evaluation strategy is analogous to the one presented in Sect. 4.1. The results show that applying the masks generated by NeRF-FF to the input images of nerfacto and splatfacto produces higher-quality rendering results, manifesting in improvements in the observed metrics PSNR, SSIM and LPIPS. The impact of NeRF-FF on the training duration is negligible. These results further underline that NeRF-FF is compatible with a multitude of other training time optimized NeRF or NeRF-like strategies.
5 Limitations and future work
Dense Reconstruction Quality. Low quality of the preliminary NeRF’s dense reconstruction subsequently reduces the quality of image masks and therefore the overall result of NeRF-FF and a subsequent NeRF model significantly. This inferior performance is caused by mismatches between in-focus indicators and the respective spatial position of the scene geometry that mainly contributes to this pixel’s rendered color. These mismatches lead to erroneous depth estimates of the in-focus indicators, negatively influencing the initial Focus Frustum estimates. An analysis of the occurring failure cases during our experiments shows that low-quality dense reconstructions often occur when the inputs contain images without in-focus regions which leads to floater artifacts—scene volume with high estimated optical density with no counterpart in the ground truth scene—in the reconstructed scene. Therefore, NeRF-FF is not suited for datasets containing fully blurred images. Future work could enhance the robustness of NeRF-FF by automatically pruning these images from the training dataset.
Floater artifacts. Some results of NeRF-FF + iNGP suffer from floater artifacts in the subsequent reconstruction of the scene based on the masked images (Fig. 7). These floater artifacts are also prevalent in unmasked Instant-NGP models and reduce image quality depending on the camera position. Recently, approaches to mitigate the influence of floater artifacts either by changing the applied training process [2] or by removing them after the training process [12, 57, 58] have been discussed. Integrating these strategies into the training process could further improve the visual quality of the results.
6 Conclusion
In this paper, we propose NeRF-FF, a plug-in method that enables the processing of partially defocus blurred input images for a multitude of NeRF variants. Our method leverages the discrete Laplace operator to detect in-focus regions in images to estimate in-focus volumes—Focus Frustums (FF)—on a per-image base. By iteratively expanding these Focus Frustums, our approach reaches full scene coverage while maintaining high visual quality. NeRF-FF is compatible with accelerated NeRF variants like Instant-NGP, offering qualitative results comparable to SoTA methods like Deblur-NeRF [26], DP-NeRF [20] and PDRF [37] while reducing training times below 1 min on end-consumer hardware. This corresponds to a relative speed-up of two orders of magnitude.
Data availability
No datasets were generated or analyzed during the current study.
References
Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In: IEEE/CVF ICCV Conference Proceedings, pp. 5855–5864 (2021)
Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In: IEEE/CVF CVPR Conference Proceedings, pp. 5470–5479 (2022)
von Buelow, M., Tausch, R., Schurig, M., Knauthe, V., Wirth, T., Guthe, S., Santos, P., Fellner, D.W.: Depth-of-field segmentation for near-lossless image compression and 3d reconstruction. J. Comput. Cult. Herit. (2022). https://doi.org/10.1145/3500924
Chen, A., Xu, Z., Geiger, A., Yu, J., Su, H.: Tensorf: Tensorial radiance fields. In: European Conference on Computer Vision, pp. 333–350. Springer (2022)
Chen, J.K., Lyu, J., Wang, Y.X.: Neuraleditor: Editing neural radiance fields via manipulating point clouds. In: IEEE/CVF CVPR Conference Proceedings, pp. 12,439–12,448 (2023)
Chen, X., Zhang, Q., Li, X., Chen, Y., Feng, Y., Wang, X., Wang, J.: Hallucinated neural radiance fields in the wild. In: IEEE/CVF CVPR Conference Proceedings, pp. 12-,943–12952 (2022)
Chen, Z., Funkhouser, T., Hedman, P., Tagliasacchi, A.: Mobilenerf: Exploiting the polygon rasterization pipeline for efficient neural field rendering on mobile architectures. In: IEEE/CVF CVPR Conference Proceedings, pp. 16569–16578 (2023)
Dai, P., Zhang, Y., Yu, X., Lyu, X., Qi, X.: Hybrid neural rendering for large-scale scenes with motion blur. In: IEEE/CVF CVPR Conference Proceedings, pp. 154–164 (2023)
Deng, K., Liu, A., Zhu, J.Y., Ramanan, D.: Depth-supervised nerf: Fewer views and faster training for free. In: IEEE/CVF CVPR Conference Proceedings, pp. 12882–12891 (2022)
Fridovich-Keil, S., Yu, A., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: Radiance fields without neural networks. In: IEEE/CVF CVPR Conference Proceedings, pp. 5501–5510 (2022)
Huang, X., Zhang, Q., Feng, Y., Li, H., Wang, X., Wang, Q.: Hdr-nerf: High dynamic range neural radiance fields. In: IEEE/CVF CVPR Conference Proceedings, pp. 18398–18408 (2022)
Jambon, C., Kerbl, B., Kopanas, G., Diolatzis, S., Drettakis, G., Leimkühler, T.: Nerfshop: Interactive editing of neural radiance fields. Proceedings of the ACM on Computer Graphics and Interactive Techniques 6(1) (2023)
Jiang, S., Jiang, H., Wang, Z., Luo, H., Chen, W., Xu, L.: Humangen: Generating human radiance fields with explicit priors. In: IEEE/CVF CVPR Conference Proceedings, pp. 12543–12554 (2023)
Jun-Seong, K., Yu-Ji, K., Ye-Bin, M., Oh, T.H.: Hdr-plenoxels: Self-calibrating high dynamic range radiance fields. In: European Conference on Computer Vision, pp. 384–401. Springer (2022)
Kajiya, J.T., Von Herzen, B.P.: Ray tracing volume densities. ACM SIGGRAPH Comput. Graph. 18(3), 165–174 (1984)
Karaali, A., Jung, C.R.: Edge-based defocus blur estimation with adaptive scale selection. IEEE Trans. Image Process. 27(3), 1126–1137 (2018). https://doi.org/10.1109/TIP.2017.2771563
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42(4), 1–14 (2023)
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42(4), 1–12 (2023)
Lazova, V., Guzov, V., Olszewski, K., Tulyakov, S., Pons-Moll, G.: Control-nerf: Editable feature volumes for scene rendering and manipulation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 4340–4350 (2023)
Lee, D., Lee, M., Shin, C., Lee, S.: Dp-nerf: Deblurred neural radiance field with physical scene priors. In: IEEE/CVF CVPR Conference Proceedings, pp. 12386–12396 (2023)
Lee, D., Oh, J., Rim, J., Cho, S., Lee, K.M.: Exblurf: Efficient radiance fields for extreme motion blurred images. In: IEEE/CVF ICCV Conference Proceedings, pp. 17639–17648 (2023)
Li, T., Slavcheva, M., Zollhoefer, M., Green, S., Lassner, C., Kim, C., Schmidt, T., Lovegrove, S., Goesele, M., Newcombe, R., et al.: Neural 3d video synthesis from multi-view video. In: IEEE/CVF CVPR Conference Proceedings, pp. 5521–5531 (2022)
Lin, C.H., Ma, W.C., Torralba, A., Lucey, S.: Barf: Bundle-adjusting neural radiance fields. In: IEEE/CVF ICCV Conference Proceedings, pp. 5741–5751 (2021)
Liu, S., Zhang, X., Zhang, Z., Zhang, R., Zhu, J.Y., Russell, B.: Editing conditional radiance fields. In: IEEE/CVF ICCV Conference Proceedings, pp. 5773–5783 (2021)
Lombardi, S., Simon, T., Schwartz, G., Zollhoefer, M., Sheikh, Y., Saragih, J.: Mixture of volumetric primitives for efficient neural rendering. ACM Trans. Graph. 40(4), 1–13 (2021)
Ma, L., Li, X., Liao, J., Zhang, Q., Wang, X., Wang, J., Sander, P.V.: Deblur-nerf: Neural radiance fields from blurry images. In: IEEE/CVF CVPR Conference Proceedings, pp. 12861–12870 (2022)
Martin-Brualla, R., Radwan, N., Sajjadi, M.S., Barron, J.T., Dosovitskiy, A., Duckworth, D.: Nerf in the wild: Neural radiance fields for unconstrained photo collections. In: IEEE/CVF CVPR Conference Proceedings, pp. 7210–7219 (2021)
Meng, Q., Chen, A., Luo, H., Wu, M., Su, H., Xu, L., He, X., Yu, J.: Gnerf: Gan-based neural radiance field without posed camera. In: IEEE/CVF ICCV Conference Proceedings, pp. 6351–6361 (2021)
Mildenhall, B., Hedman, P., Martin-Brualla, R., Srinivasan, P.P., Barron, J.T.: Nerf in the dark: High dynamic range view synthesis from noisy raw images. In: IEEE/CVF CVPR Conference Proceedings, pp. 16190–16199 (2022)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. 41(4), 1–15 (2022)
Munkberg, J., Hasselgren, J., Shen, T., Gao, J., Chen, W., Evans, A., Müller, T., Fidler, S.: Extracting triangular 3d models, materials, and lighting from images. In: IEEE/CVF CVPR Conference Proceedings, pp. 8280–8290 (2022)
Niemeyer, M., Barron, J.T., Mildenhall, B., Sajjadi, M.S., Geiger, A., Radwan, N.: Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. In: IEEE/CVF CVPR Conference Proceedings, pp. 5480–5490 (2022)
Niemeyer, M., Geiger, A.: Giraffe: Representing scenes as compositional generative neural feature fields. In: IEEE/CVF CVPR Conference Proceedings, pp. 11453–11464 (2021)
Park, K., Sinha, U., Barron, J.T., Bouaziz, S., Goldman, D.B., Seitz, S.M., Martin-Brualla, R.: Nerfies: Deformable neural radiance fields. In: IEEE/CVF ICCV Conference Proceedings, pp. 5865–5874 (2021)
Park, K., Sinha, U., Hedman, P., Barron, J.T., Bouaziz, S., Goldman, D.B., Martin-Brualla, R., Seitz, S.M.: Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. ar**v preprint ar**v:2106.13228 (2021)
Peng, C., Chellappa, R.: Pdrf: progressively deblurring radiance field for fast scene reconstruction from blurry images. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 2029–2037 (2023)
Peng, S., Dong, J., Wang, Q., Zhang, S., Shuai, Q., Zhou, X., Bao, H.: Animatable neural radiance fields for modeling dynamic human bodies. In: IEEE/CVF ICCV Conference Proceedings, pp. 14314–14323 (2021)
Pumarola, A., Corona, E., Pons-Moll, G., Moreno-Noguer, F.: D-nerf: Neural radiance fields for dynamic scenes. In: IEEE/CVF CVPR Conference Proceedings, pp. 10318–10327 (2021)
Qi, Y., Zhu, L., Zhang, Y., Li, J.: E2nerf: Event enhanced neural radiance fields from blurry images. In: IEEE/CVF ICCV Conference Proceedings, pp. 13254–13264 (2023)
Rebain, D., Matthews, M., Yi, K.M., Lagun, D., Tagliasacchi, A.: Lolnerf: Learn from one look. In: IEEE/CVF CVPR Conference Proceedings, pp. 1558–1567 (2022)
Reiser, C., Peng, S., Liao, Y., Geiger, A.: Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps. In: IEEE/CVF ICCV Conference Proceedings, pp. 14335–14345 (2021)
Rematas, K., Liu, A., Srinivasan, P.P., Barron, J.T., Tagliasacchi, A., Funkhouser, T., Ferrari, V.: Urban radiance fields. In: IEEE/CVF CVPR Conference Proceedings, pp. 12932–12942 (2022)
Schwarz, K., Liao, Y., Niemeyer, M., Geiger, A.: Graf: Generative radiance fields for 3d-aware image synthesis. Adv. Neural. Inf. Process. Syst. 33, 20154–20166 (2020)
Son, H., Lee, J., Cho, S., Lee, S.: Single image defocus deblurring using kernel-sharing parallel atrous convolutions. In: IEEE/CVF ICCV Conference Proceedings, pp. 2642–2650 (2021)
Srinivasan, P.P., Deng, B., Zhang, X., Tancik, M., Mildenhall, B., Barron, J.T.: Nerv: Neural reflectance and visibility fields for relighting and view synthesis. In: IEEE/CVF CVPR Conference Proceedings, pp. 7495–7504 (2021)
Su, S.Y., Yu, F., Zollhöfer, M., Rhodin, H.: A-nerf: Articulated neural radiance fields for learning human shape, appearance, and pose. Adv. Neural. Inf. Process. Syst. 34, 12278–12291 (2021)
Sun, C., Sun, M., Chen, H.T.: Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In: IEEE/CVF CVPR Conference Proceedings, pp. 5459–5469 (2022)
Tancik, M., Casser, V., Yan, X., Pradhan, S., Mildenhall, B., Srinivasan, P.P., Barron, J.T., Kretzschmar, H.: Block-nerf: Scalable large scene neural view synthesis. In: IEEE/CVF CVPR Conference Proceedings, pp. 8248–8258 (2022)
Tancik, M., Weber, E., Ng, E., Li, R., Yi, B., Wang, T., Kristoffersen, A., Austin, J., Salahi, K., Ahuja, A., et al.: Nerfstudio: A modular framework for neural radiance field development. In: ACM SIGGRAPH 2023 Conference Proceedings, pp. 1–12 (2023)
Tretschk, E., Tewari, A., Golyanik, V., Zollhöfer, M., Lassner, C., Theobalt, C.: Non-rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video. In: IEEE/CVF ICCV Conference Proceedings, pp. 12959–12970 (2021)
Wang, L., Zhang, J., Liu, X., Zhao, F., Zhang, Y., Zhang, Y., Wu, M., Yu, J., Xu, L.: Fourier plenoctrees for dynamic radiance field rendering in real-time. In: IEEE/CVF CVPR conference proceedings, pp. 13524–13534 (2022)
Wang, P., Zhao, L., Ma, R., Liu, P.: Bad-nerf: Bundle adjusted deblur neural radiance fields. In: IEEE/CVF CVPR Conference Proceedings, pp. 4170–4179 (2023)
Wang, Z., Shen, T., Gao, J., Huang, S., Munkberg, J., Hasselgren, J., Gojcic, Z., Chen, W., Fidler, S.: Neural fields meet explicit geometric representations for inverse rendering of urban scenes. In: The IEEE Conference on IEEE/CVF CVPR Conference Proceedings (CVPR) (2023)
Wang, Z., Shen, T., Gao, J., Huang, S., Munkberg, J., Hasselgren, J., Gojcic, Z., Chen, W., Fidler, S.: Neural fields meet explicit geometric representations for inverse rendering of urban scenes. In: IEEE/CVF CVPR Conference Proceedings, pp. 8370–8380 (2023)
Wang, Z., Wu, S., **e, W., Chen, M., Prisacariu, V.A.: Nerf–: Neural radiance fields without known camera parameters. ar**v preprint ar**v:2102.07064 (2021)
Warburg, F., Weber, E., Tancik, M., Holynski, A., Kanazawa, A.: Nerfbusters: Removing ghostly artifacts from casually captured nerfs. ar**v preprint ar**v:2304.10532 (2023)
Wirth, T., Rak, A., Knauthe, V., Fellner, D.W.: A post processing technique to automatically remove floater artifacts in neural radiance fields. In: Computer Graphics Forum, p. e14977. Wiley Online Library (2023)
Wu, Z., Li, X., Peng, J., Lu, H., Cao, Z., Zhong, W.: Dof-nerf: Depth-of-field meets neural radiance fields. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 1718–1729 (2022)
Yang, B., Bao, C., Zeng, J., Bao, H., Zhang, Y., Cui, Z., Zhang, G.: Neumesh: Learning disentangled neural mesh-based implicit field for geometry and texture editing. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVI, pp. 597–614. Springer (2022)
Yu, A., Li, R., Tancik, M., Li, H., Ng, R., Kanazawa, A.: Plenoctrees for real-time rendering of neural radiance fields. In: IEEE/CVF ICCV Conference Proceedings, pp. 5752–5761 (2021)
Yu, A., Ye, V., Tancik, M., Kanazawa, A.: pixelnerf: Neural radiance fields from one or few images. In: IEEE/CVF CVPR Conference Proceedings, pp. 4578–4587 (2021)
Yuan, Y.J., Sun, Y.T., Lai, Y.K., Ma, Y., Jia, R., Gao, L.: Nerf-editing: geometry editing of neural radiance fields. In: IEEE/CVF CVPR Conference Proceedings, pp. 18353–18364 (2022)
Zeng, C., Chen, G., Dong, Y., Peers, P., Wu, H., Tong, X.: Relighting neural radiance fields with shadow and highlight hints. In: ACM SIGGRAPH 2023 Conference Proceedings, pp. 1–11 (2023)
Zhang, J., Liu, X., Ye, X., Zhao, F., Zhang, Y., Wu, M., Zhang, Y., Xu, L., Yu, J.: Editable free-viewpoint video using a layered neural representation. ACM Trans. Graph. 40(4), 1–18 (2021)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on IEEE/CVF CVPR Conference Proceedings, pp. 586–595 (2018)
Zhang, X., Srinivasan, P.P., Deng, B., Debevec, P., Freeman, W.T., Barron, J.T.: Nerfactor: Neural factorization of shape and reflectance under an unknown illumination. ACM Trans. Graph. 40(6), 1–18 (2021)
Zhao, F., Yang, W., Zhang, J., Lin, P., Zhang, Y., Yu, J., Xu, L.: Humannerf: Efficiently generated human radiance field from sparse inputs. In: IEEE/CVF CVPR Conference Proceedings, pp. 7743–7753 (2022)
Acknowledgements
We thank the anonymous reviewers for their guidance and encouraging feedback.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Contributions
T.W. contributed to conceptualization, investigation, methodology, writing—original draft, and writing—review and editing. A.R. was involved in investigation, methodology, experimental design, writing—original draft, and writing—review and editing, and provided software. M.B. contributed to Laplacian operator and writing—review and editing. V.K., A.K., and D.W.F. were involved in guidance and feedback during the research process and writing—review and editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Supplementary file 1 (mp4 5475 KB)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wirth, T., Rak, A., von Buelow, M. et al. NeRF-FF: a plug-in method to mitigate defocus blur for runtime optimized neural radiance fields. Vis Comput 40, 5043–5055 (2024). https://doi.org/10.1007/s00371-024-03507-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-024-03507-y