Repdistiller: Knowledge Distillation Scaled by Re-parameterization for Crowd Counting

Ni, Tian; Cao, Yuchen; Liang, **aoyu; Hu, Haoji

doi:10.1007/978-981-99-8549-4_32

Tian Ni¹⁵,
Yuchen Cao¹⁵,
**aoyu Liang¹⁵ &
…
Haoji Hu¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14434))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

554 Accesses

Abstract

Knowledge distillation (KD) is an important method to compress a large teacher model into a much smaller student model. However, the large capacity gap between the teacher and student models hinders the performance of KD in various tasks. In this paper, we propose Repdistiller, a knowledge distillation framework combined with structural re-parameterization to alleviate the capacity gap problem. Repdistiller makes the student model search for parallel branches during training, thus the capacity gap between the teacher and student models is decreased. After knowledge distillation, the searched branches are merged into the student network without causing any computation overhead for inference. Taking the crowd counting task as an example, Repdistiller achieves state-of-the-art performance on the ShanghaiTech and UCF-QNRF datasets, outperforming many well-established knowledge distillation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Switchable Online Knowledge Distillation

The Phenomenon of Resonance in Knowledge Distillation: Learning Students by Non-strong Teachers

Categories of Response-Based, Feature-Based, and Relation-Based Knowledge Distillation

References

Cai, H., Zhu, L., Han, S.: ProxylessNAS: direct neural architecture search on target task and hardware. ar**v preprint ar**v:1812.00332 (2018)
Chen, Z., Badrinarayanan, V., Lee, C.Y., Rabinovich, A.: GradNorm: gradient normalization for adaptive loss balancing in deep multitask networks. In: Proceedings of the IEEE International Conference on Machine Learning, pp. 1–10 (2018)
Google Scholar
Cheng, Y., Wang, D., Zhou, P., Zhang, T.: A survey of model compression and acceleration for deep neural networks. ar**v preprint ar**v:1710.09282 (2017)
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2016)
Google Scholar
Chu, H., Tang, J., Hu, H.: Attention guided feature pyramid network for crowd counting. J. Vis. Commun. Image Represent. 80, 103319 (2021)
Article Google Scholar
Dai, X., et al.: General instance distillation for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7842–7851 (2021)
Google Scholar
Ding, X., Zhang, X., Han, J., Ding, G.: Diverse branch block: building a convolution as an inception-like unit. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10886–10895 (2021)
Google Scholar
Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., Sun, J.: RepVGG: making VGG-style convnets great again. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13733–13742 (2021)
Google Scholar
Gao, G., Gao, J., Liu, Q., Wang, Q., Wang, Y.: CNN-based density estimation and crowd counting: a survey. ar**v preprint ar**v:2003.12783 (2020)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. ar**v preprint ar**v:1503.02531 (2015)
Idrees, H., et al.: Composition loss for counting, density map estimation and localization in dense crowds. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 544–559. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_33
Chapter Google Scholar
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2704–2713 (2018)
Google Scholar
Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: Advances in Neural Information Processing Systems, vol. 23 (2010)
Google Scholar
Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient convnets. ar**v preprint ar**v:1608.08710 (2016)
Li, Y., et al.: BRECQ: pushing the limit of post-training quantization by block reconstruction. ar**v preprint ar**v:2102.05426 (2021)
Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1091–1100 (2018)
Google Scholar
Liu, Y., Cao, J., Hu, W., Ding, J., Li, L.: Cross-architecture knowledge distillation. In: Proceedings of the Asian Conference on Computer Vision, pp. 3396–3411 (2022)
Google Scholar
Liu, Y., Cao, J., Li, B., Hu, W., Maybank, S.: Learning to explore distillability and sparsability: a joint framework for model compression. IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 3378–3395 (2023)
Google Scholar
Ma, Z., Wei, X., Hong, X., Gong, Y.: Bayesian loss for crowd count estimation with point supervision. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6142–6151 (2019)
Google Scholar
Mirzadeh, S.I., Farajtabar, M., Li, A., Levine, N., Matsukawa, A., Ghasemzadeh, H.: Improved knowledge distillation via teacher assistant. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 5191–5198 (2020)
Google Scholar
Park, J., No, A.: prune your model before distill it. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision, ECCV 2022. LNCS, vol. 13671, pp. 120–136. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20083-0_8
Phuong, M., Lampert, C.H.: Towards understanding knowledge distillation. In: Proceedings of the International Conference on Machine Learning, pp. 1–10 (2019)
Google Scholar
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: hints for thin deep nets. ar**v preprint ar**v:1412.6550 (2014)
Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. ar**v preprint ar**v:1612.03928 (2016)
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 589–597 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China
Tian Ni, Yuchen Cao, **aoyu Liang & Haoji Hu

Authors

Tian Ni
View author publications
You can also search for this author in PubMed Google Scholar
Yuchen Cao
View author publications
You can also search for this author in PubMed Google Scholar
**aoyu Liang
View author publications
You can also search for this author in PubMed Google Scholar
Haoji Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haoji Hu .

Editor information

Editors and Affiliations

Nan**g University of Information Science and Technology, Nan**g, China
Qingshan Liu
**amen University, **amen, China
Hanzi Wang
Bei**g University of Posts and Telecommunications, Bei**g, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Bei**g, China
Hongbin Zha
Chinese Academy of Sciences, Bei**g, China
**lin Chen
Chinese Academy of Sciences, Bei**g, China
Liang Wang
**amen University, **amen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ni, T., Cao, Y., Liang, X., Hu, H. (2024). Repdistiller: Knowledge Distillation Scaled by Re-parameterization for Crowd Counting. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14434. Springer, Singapore. https://doi.org/10.1007/978-981-99-8549-4_32

Download citation

DOI: https://doi.org/10.1007/978-981-99-8549-4_32
Published: 25 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8548-7
Online ISBN: 978-981-99-8549-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Repdistiller: Knowledge Distillation Scaled by Re-parameterization for Crowd Counting

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Switchable Online Knowledge Distillation

The Phenomenon of Resonance in Knowledge Distillation: Learning Students by Non-strong Teachers

Categories of Response-Based, Feature-Based, and Relation-Based Knowledge Distillation

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Repdistiller: Knowledge Distillation Scaled by Re-parameterization for Crowd Counting

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Switchable Online Knowledge Distillation

The Phenomenon of Resonance in Knowledge Distillation: Learning Students by Non-strong Teachers

Categories of Response-Based, Feature-Based, and Relation-Based Knowledge Distillation

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation