Abstract
Model quantization has been extensively used to compress and accelerate deep neural network inference. Because post-training quantization methods are simple to use, they have gained considerable attention. However, when the model is quantized below 8-bits, significant accuracy degradation will be involved. This paper seeks to address this problem by building mixed-precision inference networks based on key activation layers selection. In post training quantization process, key activation layers are quantized by 8-bit precision, and non-key activation layers are quantized by 4-bit precision. The experimental results indicate an impressive promotion with our method. Relative to ResNet-50(W8A8) and VGG-16(W8A8), our proposed method can accelerate inference with lower power consumption and a little accuracy loss.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Banner, R., Nahshan, Y., Hoffer, E., Soudry, D.: Post-training 4-bit quantization of convolution networks for rapid-deployment. ar**v Computer Vision and Pattern Recognition (2018)
Banner, R., Nahshan, Y., Hoffer, E., Soudry, D.: ACIQ: analytical clip** for integer quantization of neural networks. ar**%20for%20integer%20quantization%20of%20neural%20networks.%20ar**v%20%282019%29"> Google Scholar
Krishnamoorthi, R.: Quantizing deep convolutional networks for efficient inference: a whitepaper. ar**v Machine Learning (2018)
Migacz, S.: 8-bit inference with TensorRT. In GPU Technology Conference (2017)
Nagel, M., Van Baalen, M., Blankevoort, T., Welling, M.: Data-free quantization through weight equalization and bias correction. ar**v International Conference on Computer Vision (2019)
Acknowledgements
The work was funded by the Key R&D Plan of Shandong Province (No. 2019JZZY011101).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Liang, L. (2020). Post Training Mixed-Precision Quantization Based on Key Layers Selection. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12539. Springer, Cham. https://doi.org/10.1007/978-3-030-68238-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-68238-5_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68237-8
Online ISBN: 978-3-030-68238-5
eBook Packages: Computer ScienceComputer Science (R0)