Abstract
In real-time rendering applications, mesh rendering quality suffers from limited GPU memory capacity and display resolution. Due to the increased complexity of models and the demand for higher display resolutions, people have started building commodity workstations with multiple GPUs at a low cost. As a result, more GPU memory is available across multiple GPUs, and a higher display resolution can be achieved by connecting each GPU to a display monitor, resulting in a large tiled display configuration. However, a multi-GPU workstation may not efficiently handle a complex model that cannot fit into the GPU memory, due to (1) the unified configuration treating GPUs as one hardware entity and requiring the same data replicated in all GPUs, and (2) the lack of scalability to reduce, balance, and stream data dynamically between the CPU and GPUs as well as among the GPUs. In this work, we present a fine-grained parallel rendering approach that integrates a view-dependent LOD selection strategy with the inter-GPU load balancing method to ensure each GPU handles the portion of data it rasterizes, without data replication. A new multi-GPU out-of-core method minimizes the amount of data transferred from the CPU to each GPU by taking the advantage of frame-to-frame coherence. A comprehensive evaluation is presented to understand the efficiency and scalability of the execution components over extremely large scenes.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-022-02740-7/MediaObjects/371_2022_2740_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-022-02740-7/MediaObjects/371_2022_2740_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-022-02740-7/MediaObjects/371_2022_2740_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-022-02740-7/MediaObjects/371_2022_2740_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-022-02740-7/MediaObjects/371_2022_2740_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-022-02740-7/MediaObjects/371_2022_2740_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-022-02740-7/MediaObjects/371_2022_2740_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-022-02740-7/MediaObjects/371_2022_2740_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-022-02740-7/MediaObjects/371_2022_2740_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-022-02740-7/MediaObjects/371_2022_2740_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-022-02740-7/MediaObjects/371_2022_2740_Fig11_HTML.png)
Similar content being viewed by others
Data Availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Abraham, F., Celes, W., Cerqueira, R., Campos, J.L.: A load-balancing strategy for sort-first distributed rendering. In: Proceedings. 17th Brazilian Symposium on Computer Graphics and Image Processing, pp. 292–299 (2004)
Ahrens, J., Li-Ta Lo, Nouanesengsy, B., Patchett, J., McPherson, A.: Petascale visualization: approaches and initial results. In: 2008 Workshop on Ultrascale Visualization, pp. 24–28 (2008)
Allard, J., Raffin, B.: A shader-based parallel rendering framework. In: VIS 05. IEEE Visualization, pp. 127–134 (2005)
AMD: CrossFire technology (2017). https://www.amd.com/en/technologies/crossfire
Argudo, O., Besora, I., Brunet, P., Creus, C., Hermosilla, P., Navazo, I., Vinacua, À.: Interactive inspection of complex multi-object industrial assemblies. Comput.-Aided Des. 79, 48–59 (2016). https://doi.org/10.1016/j.cad.2016.06.005. www.sciencedirect.com/science/article/pii/S0010448516300628
Bethel, E.W., van Rosendale, J., Southard, D., Gaither, K., Childs, H., Brugger, E., Ahern, S.: Visualization at supercomputing centers: the tale of little big iron and the three skinny guys. IEEE Comput. Graph. Appl. 31(1), 90–95 (2011). https://doi.org/10.1109/MCG.2011.13
Bhaniramka, P., Robert, P.C., Eilemann, S.: OpenGL multipipe SDK: a toolkit for scalable parallel rendering. In: VIS 05. IEEE Visualization, pp. 119–126. IEEE (2005)
Cabiddu, D., Attene, M.: Large mesh simplification for distributed environments. Comput. Graph. 51, 81–89 (2015)
Cignoni, P., Ganovelli, F., Gobbetti, E., Marton, F., Ponchio, F., Scopigno, R.: BDAM—batched dynamic adaptive meshes for high performance terrain visualization. Comput. Graph. Forum (2003). https://doi.org/10.1111/1467-8659.00698
Cruz-Neira, C., Sandin, D.J., DeFanti, T.A., Kenyon, R.V., Hart, J.C.: The CAVE: audio visual experience automatic virtual environment. Commun. ACM 35(6), 64–73 (1992)
Dong, Y., Peng, C.: Screen partitioning load balancing for parallel rendering on a multi-GPU multi-display workstation. In: Childs, H., Frey, S. (eds.) Eurographics Symposium on Parallel Graphics and Visualization. The Eurographics Association (2019). https://doi.org/10.2312/pgv.20191111
Eilemann, S.: An analysis of parallel rendering systems. White Paper (2007). https://eyescale.github.io/equalizergraphics.com/documents/WhitePapers/ParallelRenderingSystems.pdf
Eilemann, S., Bilgili, A., Abdellah, M., Hernando, J., Makhinya, M., Pajarola, R., Schürmann, F.: Parallel rendering on hybrid multi-GPU clusters. In: Childs, H., Kuhlen, T., Marton, F. (eds.) Eurographics Symposium on Parallel Graphics and Visualization. The Eurographics Association (2012). https://doi.org/10.2312/EGPGV/EGPGV12/109-117
Eilemann, S., Makhinya, M., Pajarola, R.: Equalizer: a scalable parallel rendering framework. IEEE Trans. Vis. Comput. Graph. 15(3), 436–452 (2009)
Eilemann, S., Steiner, D., Pajarola, R.: Equalizer 2.0—convergence of a parallel rendering framework. IEEE Trans. Vis. Comput. Graph. 26(2), 1292–1307 (2020)
Erol, F., Eilemann, S., Pajarola, R.: Cross-segment load balancing in parallel rendering. In: Proceedings of the 11th Eurographics Conference on Parallel Graphics and Visualization, EGPGV ’11, pp. 41–50. Eurographics Association, Goslar, DEU (2011)
Febretti, A., Nishimoto, A., Mateevitsi, V., Renambot, L., Johnson, A., Leigh, J.: Omegalib: a multi-view application framework for hybrid reality display environments. In: 2014 IEEE Virtual Reality (VR), pp. 9–14. IEEE (2014)
Febretti, A., Nishimoto, A., Thigpen, T., Talandis, J., Long, L., Pirtle, J., Peterka, T., Verlo, A., Brown, M., Plepys, D., et al.: CAVE2: A hybrid reality environment for immersive simulation and information analysis. In: The Engineering Reality of Virtual Reality, vol. 8649, p. 864903. International Society for Optics and Photonics (2013)
Funkhouser, T.A., Séquin, C.H.: Adaptive display algorithm for interactive frame rates during visualization of complex virtual environments. In: Proceedings of the 20th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’93, pp. 247–254. Association for Computing Machinery, New York (1993). https://doi.org/10.1145/166117.166149
Gobbetti, E., Marton, F.: Far Voxels: a multiresolution framework for interactive rendering of huge complex 3D models on commodity graphics platforms. ACM Trans. Graph. 24(3), 878–885 (2005). https://doi.org/10.1145/1073204.1073277
Grosset, A.V.P., Prasad, M., Christensen, C., Knoll, A., Hansen, C.: TOD-Tree: task-overlapped direct send tree image compositing for hybrid MPI parallelism and GPUs. IEEE Trans. Vis. Comput. Graph. 23(6), 1677–1690 (2017)
Han, M., Wald, I., Usher, W., Morrical, N., Knoll, A., Pascucci, V., Johnson, C.R.: A virtual frame buffer abstraction for parallel rendering of large tiled display walls. In: Proceedings of the IEEE Visualization Conference (VIS), pp. 11–15 (2020). https://doi.org/10.1109/VIS47514.2020.00009
Harris, M., Sengupta, S., Owens, J.D.: Parallel prefix sum (scan) with CUDA. GPU Gems 3(39), 851–876 (2007)
Hu, L., Sander, P.V., Hoppe, H.: Parallel view-dependent level-of-detail control. IEEE Trans. Vis. Comput. Graph. 16(5), 718–728 (2010)
Humphreys, G., Houston, M., Ng, R., Frank, R., Ahern, S., Kirchner, P.D., Klosowski, J.T.: Chromium: a stream-processing framework for interactive rendering on clusters. In: Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’02, pp. 693–702. Association for Computing Machinery, New York (2002). https://doi.org/10.1145/566570.566639
Kenzel, M., Kerbl, B., Schmalstieg, D., Steinberger, M.: A high-performance software graphics pipeline architecture for the GPU. ACM Trans. Graph. (2018). https://doi.org/10.1145/3197517.3201374
Kontkanen, J., Tabellion, E., Overbeck, R.S.: Coherent out-of-core point-based global illumination. In: Proceedings of the 22nd Eurographics Conference on Rendering, EGSR ’11, pp. 1353–1360. Eurographics Association, Goslar, DEU (2011). https://doi.org/10.1111/j.1467-8659.2011.01995.x
Lai, D.Q., Sajadi, B., Jiang, S., Meenakshisundaram, G., Majumder, A.: A distributed memory hierarchy and data management for interactive scene navigation and modification on tiled display walls. IEEE Trans. Vis. Comput. Graph. 21(6), 714–729 (2015). https://doi.org/10.1109/TVCG.2015.2398439
Laine, S., Karras, T.: High-performance software rasterization on GPUs. In: Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics, HPG ’11, pp. 79–88. Association for Computing Machinery, New York (2011). https://doi.org/10.1145/2018323.2018337
Larsen, M., Moreland, K., Johnson, C.R., Childs, H.: Optimizing multi-image sort-last parallel rendering. In: 2016 IEEE 6th Symposium on Large Data Analysis and Visualization (LDAV), pp. 37–46 (2016)
Liu, H., Wang, P., Wang, K., Cai, X., Zeng, L., Li, S.: Scalable multi-GPU decoupled parallel rendering approach in shared memory architecture. In: 2011 International Conference on Virtual Reality and Visualization, pp. 172–178 (2011)
Melax, S.: A simple, fast, and effective polygon reduction algorithm. Game Dev. 11, 44–49 (1998)
Molnar, S., Cox, M., Ellsworth, D., Fuchs, H.: A sorting classification of parallel rendering. IEEE Comput. Graph. Appl. 14(4), 23–32 (1994)
Moloney, B., Weiskopf, D., Möller, T., Strengert, M.: Scalable sort-first parallel direct volume rendering with dynamic load balancing. In: Proceedings of the 7th Eurographics Conference on Parallel Graphics and Visualization, EGPGV ’07, pp. 45–52. Eurographics Association, Goslar, DEU (2007)
NVIDIA: SLI best practices. Tech. rep., Technical report, NVIDIA Corporation (2011). http://developer.download.nvidia.com/whitepapers/2011/SLI_Best_Practices_2011_Feb.pdf
NVIDIA: NVIDIA Mosaic technology—user’s guide (2017). https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/quadro-product-literature/NVMosaic-UG.pdf
Peng, C., Cao, Y.: A GPU-based approach for massive model rendering with frame-to-frame coherence. Comput. Graph. Forum 31(2pt2), 393–402 (2012). https://doi.org/10.1111/j.1467-8659.2012.03018.x
Peng, C., Cao, Y.: Parallel LOD for CAD model rendering with effective GPU memory usage. Comput.-Aided Des. Appl. 13(2), 173–183 (2016). https://doi.org/10.1080/16864360.2015.1084184
Ren, X., Lustig, D., Bolotin, E., Jaleel, A., Villa, O., Nellans, D.: Hmg: extending cache coherence protocols across modern hierarchical multi-GPU systems. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 582–595. IEEE (2020)
Samanta, R., Zheng, J., Funkhouser, T., Li, K., Singh, J.P.: Load balancing for multi-projector rendering systems. In: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware, HWWS ’99, pp. 107–116. Association for Computing Machinery, New York (1999). https://doi.org/10.1145/311534.311584
Son, M., Yoon, S.E.: Timeline scheduling for out-of-core ray batching. In: Proceedings of High Performance Graphics, HPG ’17. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3105762.3105784
Steiner, D., Paredes, E.G., Eilemann, S., Pajarola, R.: Dynamic work packages in parallel rendering. In: Gobbetti, E., Bethel, W. (eds.) Eurographics Symposium on Parallel Graphics and Visualization. The Eurographics Association (2016). https://doi.org/10.2312/pgv.20161185
Varadhan, G., Manocha, D.: Out-of-core rendering of massive geometric environments. In: Proceedings of the Conference on Visualization ’02, VIS ’02, pp. 69–76. IEEE Computer Society, USA (2002)
Wang, P., Liu, H., Li, S., Zeng, L., Cai, X.: Multi-GPU compositeless parallel rendering algorithm. In: 2011 12th International Conference on Computer-Aided Design and Computer Graphics, pp. 103–107 (2011)
Whitman, S.: Dynamic load balancing for parallel polygon rendering. IEEE Comput. Graph. Appl. 14(4), 41–48 (1994)
Yoon, S.E., Salomon, B., Gayle, R., Manocha, D.: Quick-VDR: interactive view-dependent rendering of massive models. In: Proceedings of the Conference on Visualization ’04, VIS ’04, pp. 131–138. IEEE Computer Society, USA (2004). https://doi.org/10.1109/VISUAL.2004.86
Zheng, G., Bhatele, A., Meneses, E., Kale, L.V.: Periodic hierarchical load balancing for large supercomputers. Int. J. High Perform. Comput. Appl. 25(4), 371–385 (2011)
Acknowledgements
This work was supported by the National Science Foundation Grant CNS-1464323. We thank Nvidia for donating the GPU device that has been used in this work to run our algorithms and produce the experimental results. The Power Plant model is brought through the courtesy of the University of North Carolina at Chapel Hill. We also thank the RIT MAGIC Center for their technical and logistic support for this research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest/competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Supplementary file 1 (mp4 82067 KB)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dong, Y., Peng, C. Multi-GPU multi-display rendering of extremely large 3D environments. Vis Comput 39, 6473–6489 (2023). https://doi.org/10.1007/s00371-022-02740-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-022-02740-7