Abstract
The paper aims to compare the performance of deep convolutional network inference. Experiments are carried out on a high-end server with two Intel Xeon Platinum 8260L 2.4 GHz CPUs (48 cores in total). Performance analysis is done using the ResNet-50 and GoogleNet-v3 models. The inference is implemented employing the commonly used software libraries, namely Intel Distribution of Caffe, TensorFlow, PyTorch, MXNet, OpenCV, and the Intel Distribution of OpenVINO toolkit. We compare total run time and the number of processed frames per second and examine the strong scaling efficiency when using up to 48 CPU cores. Experiments have shown that OpenVINO provides the best performance and scales well up to 48 cores. We also observe that OpenVINO in the Throughput mode compared to latency mode accelerates inference from 4.9x for an image batch size of 1 to 1.4x for an image batch size of 32. We found that INT8 quantization in OpenVINO substantially improves the inference performance while maintaining almost the same classification quality.
The paper is recommended for publication by the Program Committee of the international conference Mathematical Modelling and Supercomputing Technologies-2020.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Apache MXNet. https://mxnet.apache.org
Default Quantization algorithm in Intel Distribution of OpenVINO Toolkit. https://docs.openvinotoolkit.org/latest/pot_compression_algorithms_quantization_default_README.html
Inference Performance Analysis repository. https://github.com/itlab-vision/inference_performance_analysis
Intel Distribution of Caffe. https://github.com/intel/caffe
Intel Distribution of OpenVINO toolkit. https://software.intel.com/en-us/openvino-toolkit
OneAPI Deep Neural Network Library. https://github.com/oneapi-src/oneDNN
OpenCV. https://opencv.org
PyTorch. https://pytorch.org
TensorFlow. https://www.tensorflow.org
Uniform Quantization in the Intel Distribution of OpenVINO Toolkil. https://docs.openvinotoolkit.org/latest/po_compression_algorithms_quantization_README.html
Abts, D., et al.: Think fast: a tensor streaming processor (TSP) for accelerating deep learning workloads. In: Proceedings of the Symposium on Computer Architecture, pp. 145–158 (2020). https://doi.org/10.1109/ISCA45697.2020.00023
Ciaparrone, G., et al.: Deep learning in video multi-object tracking: a survey. Neurocomputing 381, 61–88 (2020). https://doi.org/10.1016/j.neucom.2019.11.023
Coleman, C., et al.: DAWNBench: an end-to-end deep learning benchmark and competition. In: NIPS ML Systems Workshop, pp. 1–10 (2017). https://dawn.cs.stanford.edu/benchmark/papers/nips17-dawnbench.pdf
George, D., Huerta, E.A.: Deep Learning for real-time gravitational wave detection and parameter estimation: results with advanced LIGO data. Phys. Lett. B 778, 64–70 (2018). https://doi.org/10.1016/j.physletb.2017.12.053
Gonoskov, A., et al.: Employing machine learning for theory validation and identification of experimental conditions in laser-plasma physics. Sci. Rep. 9(1), 1–15 (2019). https://doi.org/10.1038/s41598-019-43465-3
Gorbachev, Y., et al.: OpenVINO deep learning workbench: comprehensive analysis and tuning of neural networks inference. In: Proceedings of the IEEE/ICCV Workshops (2019)
He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Ignatov, A., et al.: AI benchmark: all about deep learning on smartphones in 2019, pp. 3617–3635, October 2019. https://doi.org/10.1109/ICCVW.2019.00447
Jain, A., et al.: Efficient execution of quantized deep learning models: a compiler approach. arxiv preprint ar**v:2006.10226 (2020)
Kustikova, V., Vasiliev, E., Khvatov, A., Kumbrasiev, P., Rybkin, R., Kogteva, N.: DLI: deep learning inference benchmark. In: Voevodin, V., Sobolev, S. (eds.) RuSCDays 2019. CCIS, vol. 1129, pp. 542–553. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-36592-9_44
March, P.S.: Optimize Virtualized Deep Learning Performance with New Intel Architectures (2020). https://www.vmware.com/techpapers/2020/virtualized-vnni-perf.html
Park, J., et al.: Deep learning inference in Facebook data centers: characterization, performance optimizations and hardware implications. ar**v preprint ar**v:1811.09886 (2018)
Raissi, M., et al.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019). https://doi.org/10.1016/j.jcp.2018.10.045
Ravi, D., et al.: Deep learning for health informatics. IEEE J. Biomed. Health Inform. 21(1), 4–21 (2017). https://doi.org/10.1109/JBHI.2016.2636665
Reddi, V.J.: MLPerf inference benchmark. In: Proceedings of the Symposium on Computer Architecture, pp. 446–459 (2020). https://doi.org/10.1109/ISCA45697.2020.00045
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Szegedy, C., et al.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Computer Society Conference on CV and Pattern Recognition, pp. 2818–2826 (2016). https://doi.org/10.1109/CVPR.2016.308
Torelli, P., Bangale, M.: Measuring Inference Performance of Machine-Learning Frameworks on Edge-class Devices with the MLMark\(^{{\rm TM}}\) Benchmark. https://www.eembc.org/techlit/articles/MLMARK-WHITEPAPER-FINAL-1.pdf
Voulodimos, A., et al.: Deep learning for computer vision: a brief review. Comput. Intell. Neurosci. (2018). https://doi.org/10.1155/2018/7068349
Wu, H., et al.: Integer quantization for deep learning inference: principles and empirical evaluation. ar**v preprint ar**v:2004.09602 (2020)
Yang, C.T., et al.: Performance benchmarking of deep learning framework on Intel Xeon Phi. J. Supercomput. (2020). https://doi.org/10.1007/s11227-020-03362-3
Young, T., et al.: Recent trends in deep learning based natural language processing [Review Article]. IEEE Comput. Intell. Mag. 13(3), 55–75 (2018). https://doi.org/10.1109/MCI.2018.2840738
Acknowledgements
I.M. and V.V. acknowledge support of Russian Government Grant No. 0729-2020-0055. E.K., E.V., and V.K. acknowledge support of Intel Corporation. The authors are grateful to N. Ageeva, Yu. Gorbachev, K. Korniakov, and Z. Matveev for valuable comments. The experiments were performed on the Intel Endeavor supercomputer at Intel and the Lobachevsky supercomputer at UNN.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Vasiliev, E.P., Kustikova, V.D., Volokitin, V.D., Kozinov, E.A., Meyerov, I.B. (2021). Performance Analysis of Deep Learning Inference in Convolutional Neural Networks on Intel Cascade Lake CPUs. In: Balandin, D., Barkalov, K., Gergel, V., Meyerov, I. (eds) Mathematical Modeling and Supercomputer Technologies. MMST 2020. Communications in Computer and Information Science, vol 1413. Springer, Cham. https://doi.org/10.1007/978-3-030-78759-2_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-78759-2_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78758-5
Online ISBN: 978-3-030-78759-2
eBook Packages: Computer ScienceComputer Science (R0)