PalQuant: Accelerating High-Precision Networks on Low-Precision Accelerators

Hu, Qinghao; Li, Gang; Wu, Qiman; Cheng, Jian

doi:10.1007/978-3-031-20083-0_19

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13671))

Included in the following conference series:

European Conference on Computer Vision

2156 Accesses

Abstract

Recently low-precision deep learning accelerators (DLAs) have become popular due to their advantages in chip area and energy consumption, yet the low-precision quantized models on these DLAs bring in severe accuracy degradation. One way to achieve both high accuracy and efficient inference is to deploy high-precision neural networks on low-precision DLAs, which is rarely studied. In this paper, we propose the PArallel Low-precision Quantization (PalQuant) method that approximates high-precision computations via learning parallel low-precision representations from scratch. In addition, we present a novel cyclic shuffle module to boost the cross-group information communication between parallel low-precision groups. Extensive experiments demonstrate that PalQuant has superior performance to state-of-the-art quantization methods in both accuracy and inference speed, e.g., for ResNet-18 network quantization, PalQuant can obtain 0.52% higher accuracy and 1.78\(\times \) speedup simultaneously over their 4-bit counter-part on a state-of-the-art 2-bit accelerator. Code is available at https://github.com/huqinghao/PalQuant.

Q. Hu, G. Li and Q. Wu—Equal Contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

TrainBF: High-Performance DNN Training Engine Using BFloat16 on AI Accelerators

Accuracy to Throughput Trade-Offs for Reduced Precision Neural Networks on Reconfigurable Logic

A Precision-Aware Neuron Engine for DNN Accelerators

Article 26 April 2024

Notes

1.
For example, given the 4-bit number x, the lowest 2-bit number changes from 3 to 0 when x changes from 3 to 4 (\(\left[ 0\ 0\ 1 \ \ 1\right] \rightarrow \left[ 0\ 1\ 0 \ 0\right] \)).

References

Abdel-Aziz, H., Shafiee, A., Shin, J.H., Pedram, A., Hassoun, J.H.: Rethinking floating point overheads for mixed precision DNN accelerators. CoRR abs/2101.11748 (2021). https://arxiv.org/abs/2101.11748
Andri, R., Cavigelli, L., Rossi, D., Benini, L.: YodaNN: an architecture for ultralow power binary-weight CNN acceleration. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 37(1), 48–60 (2017)
Article Google Scholar
Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. ar**v preprint ar**v:1308.3432 (2013)
Bogart, K.P.: Introductory Combinatorics. Saunders College Publishing (1989)
Google Scholar
Cai, Z., He, X., Sun, J., Vasconcelos, N.: Deep learning with low precision by half-wave gaussian quantization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5918–5926 (2017)
Google Scholar
Camus, V., Mei, L., Enz, C., Verhelst, M.: Review and benchmarking of precision-scalable multiply-accumulate unit architectures for embedded neural-network processing. IEEE J. Emerg. Sel. Top. Circ. Syst. 9(4), 697–711 (2019)
Article Google Scholar
Chen, T., et al.: DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGARCH Comput. Archit. News 42(1), 269–284 (2014)
Article Google Scholar
Chen, Y.H., Krishna, T., Emer, J.S., Sze, V.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circ. 52(1), 127–138 (2016)
Article Google Scholar
Choi, J., Wang, Z., Venkataramani, S., Chuang, P.I.J., Srinivasan, V., Gopalakrishnan, K.: PACT: parameterized clip** activation for quantized neural networks. ar**v preprint ar**v:1805.06085 (2018)
Conti, F., Schiavone, P.D., Benini, L.: XNOR neural engine: a hardware accelerator IP for 21.6 fJ/op binary neural network inference. CoRR abs/1807.03010 (2018), http://arxiv.org/abs/1807.03010
Courbariaux, M., Bengio, Y., David, J.P.: BinaryConnect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems, pp. 3123–3131 (2015)
Google Scholar
Delmas Lascorz, A., et al.: Bit-tactical: a software/hardware approach to exploiting value and bit sparsity in neural networks. In: ASPLOS, pp. 749–763 (2019)
Google Scholar
Esser, S.K., McKinstry, J.L., Bablani, D., Appuswamy, R., Modha, D.S.: Learned step size quantization. ar**v preprint ar**v:1902.08153 (2019)
Ghodrati, S., Sharma, H., Young, C., Kim, N., Esmaeilzadeh, H.: Bit-parallel vector composability for neural acceleration. In: 2020 57th ACM/IEEE Design Automation Conference (DAC), pp. 1–6 (2020)
Google Scholar
Gong, R., et al.: Differentiable soft quantization: bridging full-precision and low-bit neural networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4852–4861 (2019)
Google Scholar
Gupta, S., Agrawal, A., Gopalakrishnan, K., Narayanan, P.: Deep learning with limited numerical precision. CoRR, abs/1502.02551 392 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Hou, L., Yao, Q., Kwok, J.T.: Loss-aware binarization of deep networks. ar**v preprint ar**v:1611.01600 (2016)
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: Lee, D.D., Sugiyama, M., von Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems, 5–10 December 2016, Barcelona, Spain, pp. 4107–4115 (2016). https://proceedings.neurips.cc/paper/2016/hash/d8330f857a17c53d217014ee776bfd50-Abstract.html
Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, pp. 1–12 (2017)
Google Scholar
Jung, S., et al.: Learning to quantize deep networks by optimizing quantization intervals with task loss. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4350–4359 (2019)
Google Scholar
Lee, J., Kim, D., Ham, B.: Network quantization with element-wise gradient scaling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)
Google Scholar
Li, F., et al.: A system-level solution for low-power object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Google Scholar
Li, F., Zhang, B., Liu, B.: Ternary weight networks. ar**v preprint ar**v:1605.04711 (2016)
Lin, Z., Courbariaux, M., Memisevic, R., Bengio, Y.: Neural networks with few multiplications. ar**v preprint ar**v:1510.03009 (2015)
McKinstry, J.L., et al.: Discovering low-precision networks close to full-precision networks for efficient embedded inference. ar**v preprint ar**v:1809.04191 (2018)
McKinstry, J.L., et al.: Discovering low-precision networks close to full-precision networks for efficient inference. In: 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS), pp. 6–9. IEEE (2019)
Google Scholar
Mishra, A., Nurvitadhi, E., Cook, J.J., Marr, D.: WRPN: wide reduced-precision networks. ar**v preprint ar**v:1709.01134 (2017)
Qiu, J., et al.: Going deeper with embedded fpga platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 26–35. ACM (2016)
Google Scholar
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Chapter Google Scholar
Ryu, S., Kim, H., Yi, W., Kim, J.J.: BitBlade: area and energy-efficient precision-scalable neural network accelerator with bitwise summation. In: Proceedings of the 56th Annual Design Automation Conference 2019, pp. 1–6 (2019)
Google Scholar
Sharma, H., et al.: Bit fusion: bit-level dynamically composable architecture for accelerating deep neural network. In: 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), pp. 764–775 (2018). https://doi.org/10.1109/ISCA.2018.00069
Tann, H., Hashemi, S., Bahar, R.I., Reda, S.: Hardware-software codesign of accurate, multiplier-free deep neural networks. In: 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–6. IEEE (2017)
Google Scholar
Umuroglu, Y., et al.: FINN: a framework for fast, scalable binarized neural network inference. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 65–74 (2017)
Google Scholar
Wussing, H.: The Genesis of the Abstract Group Concept: A Contribution to the History of the Origin of Abstract Group Theory. Courier Corporation (2007)
Google Scholar
Zhang, C., Li, P., Sun, G., Guan, Y., **ao, B., Cong, J.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-programmable Gate Arrays, pp. 161–170 (2015)
Google Scholar
Zhang, D., Yang, J., Ye, D., Hua, G.: LQ-nets: learned quantization for highly accurate and compact deep neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 365–382 (2018)
Google Scholar
Zhao, R., et al.: Accelerating binarized convolutional neural networks with software-programmable FPGAs. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 15–24 (2017)
Google Scholar
Zhou, A., Yao, A., Guo, Y., Xu, L., Chen, Y.: Incremental network quantization: towards lossless CNNs with low-precision weights. ar**v preprint ar**v:1702.03044 (2017)
Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. ar**v preprint ar**v:1612.01064 (2016)

Download references

Acknowledgements

This work was supported in part by National Key Research and Development Program of China (Grant No. 2021ZD0201504), and National Natural Science Foundation of China (Grant No. 62106267).

Author information

Authors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Bei**g, China
Qinghao Hu & Jian Cheng
Shanghai Jiao Tong University, Shanghai, China
Gang Li
Baidu Inc., Bei**g, China
Qiman Wu

Authors

Qinghao Hu
View author publications
You can also search for this author in PubMed Google Scholar
Gang Li
View author publications
You can also search for this author in PubMed Google Scholar
Qiman Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jian Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jian Cheng .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 166 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, Q., Li, G., Wu, Q., Cheng, J. (2022). PalQuant: Accelerating High-Precision Networks on Low-Precision Accelerators. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13671. Springer, Cham. https://doi.org/10.1007/978-3-031-20083-0_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-20083-0_19
Published: 03 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20082-3
Online ISBN: 978-3-031-20083-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

PalQuant: Accelerating High-Precision Networks on Low-Precision Accelerators

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

TrainBF: High-Performance DNN Training Engine Using BFloat16 on AI Accelerators

Accuracy to Throughput Trade-Offs for Reduced Precision Neural Networks on Reconfigurable Logic

A Precision-Aware Neuron Engine for DNN Accelerators

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 166 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

PalQuant: Accelerating High-Precision Networks on Low-Precision Accelerators

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

TrainBF: High-Performance DNN Training Engine Using BFloat16 on AI Accelerators

Accuracy to Throughput Trade-Offs for Reduced Precision Neural Networks on Reconfigurable Logic

A Precision-Aware Neuron Engine for DNN Accelerators

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 166 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation