Abstract
In crowd counting datasets, the location labels are costly, yet, they are not taken into the evaluation metrics. Besides, existing multi-task approaches employ high-level tasks to improve counting accuracy. This research tendency increases the demand for more annotations. In this paper, we propose a weakly-supervised counting network, which directly regresses the crowd numbers without the location supervision. Moreover, we train the network to count by exploiting the relationship among the images. We propose a soft-label sorting network along with the counting network, which sorts the given images by their crowd numbers. The sorting network drives the shared backbone CNN model to obtain density-sensitive ability explicitly. Therefore, the proposed method improves the counting accuracy by utilizing the information hidden in crowd numbers, rather than learning from extra labels, such as locations and perspectives. We evaluate our proposed method on three crowd counting datasets, and the performance of our method plays favorably against the fully supervised state-of-the-art approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Boominathan, L., Kruthiventi, S.S.S., Babu, R.V.: Crowdnet: a deep convolutional network for dense crowd counting. In: ACM Multimedia, pp. 640–644 (2016)
Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 757–773 (2018)
Chan, A.B., Liang, Z.S.J., Vasconcelos, N.: Privacy preserving crowd monitoring: counting people without people models or tracking. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–7 (2008)
Cheng, Z., Li, J., Dai, Q., Wu, X., Hauptmann, A.G.: Learning spatial awareness to improve crowd counting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6152–6161 (2019)
Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transportation distances. In: Neural Information Processing Systems, pp. 2292–2300 (2013)
Cuturi, M., Teboul, O., Vert, J.: Differentiable ranks and sorting using optimal transport. In: Conference on Neural Information Processing Systems (2019)
Deb, D., Ventura, J.: An aggregated multicolumn dilated convolution network for perspective-free counting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 195–204 (2018)
Grover, A., Wang, E.H., Zweig, A., Ermon, S.: Stochastic optimization of sorting networks via continuous relaxations. In: International Conference on Learning Representations (2019)
Guerrerogomezolmedo, R., Torrejimenez, B., Lopezsastre, R.J., Maldonadobascon, S., Onororubio, D.: Extremely overlap** vehicle counting. In: Iberian Conference on Pattern Recognition and Image Analysis, pp. 423–431 (2015)
Guo, B., et al.: Mobile crowd sensing and computing: the review of an emerging human-powered sensing paradigm. ACM Comput. Surv. 48(1), 7:1–7:31 (2015)
Huang, S., et al.: Body structure aware deep crowd counting. IEEE Trans. Image Process. 27(3), 1049–1059 (2018)
Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2547–2554 (2013)
Jiang, X., et al.: Crowd counting and density estimation by trellis encoder-decoder network. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Lempitsky, V.S., Zisserman, A.: Learning to count objects in images. In: NIPS (2010)
Li, Y., Zhang, X., Chen, D.: Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1091–1100 (2018)
Linderman, S.W., Mena, G., Cooper, H., Paninski, L., Cunningham, J.P.: Reparameterizing the birkhoff polytope for variational permutation inference. In: International Conference on Artificial Intelligence and Statistics (2017)
Liu, C., Wen, X., Mu, Y.: Recurrent attentive zooming for joint crowd counting and precise localization. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Liu, J., Gao, C., Meng, D., Hauptmann, A.G.: Decidenet: counting varying density crowds through attention guided detection and density estimation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5197–5206 (2018)
Liu, N., Long, Y., Zou, C., Niu, Q., Pan, L., Wu, H.: Adcrowdnet: an attention-injective deformable convolutional network for crowd understanding. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Liu, X., De Weijer, J.V., Bagdanov, A.D.: Leveraging unlabeled data for crowd counting by learning to rank. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7661–7669 (2018)
Liu, Y., Shi, M., Zhao, Q., Wang, X.: Point in, box out: beyond counting persons in crowds. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Longyin, W., et al.: Drone-based joint density map estimation, localization and tracking with space-time multi-scale attention network. arxiv (2020)
Loy, C.C., Gong, S., **ang, T.: From semi-supervised to transfer counting of crowds. In: International Conference on Computer Vision, pp. 2256–2263 (2013)
Ma, Z., Wei, X., Hong, X., Gong, Y.: Bayesian loss for crowd count estimation with point supervision. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving Jigsaw puzzles. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 69–84. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_5
Ranjan, V., Le, H., Hoai, M.: Iterative crowd counting. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 278–293. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_17
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer Assisted Intervention, pp. 234–241 (2015)
Sam, D.B., Babu, R.V.: Top-down feedback for crowd counting convolutional neural network. In: National Conference on Artificial Intelligence, pp. 7323–7330 (2018)
Sam, D.B., Sajjan, N., Babu, R.V., Srinivasan, M.: Divide and grow: capturing huge diversity in crowd images with incrementally growing CNN. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3618–3626 (2018)
Sam, D.B., Sajjan, N.N., Maurya, H., Radhakrishnan, V.B.: Almost unsupervised learning for dense crowd counting. Assoc. Adv. Artif. Intell. 33, 8868–8875 (2019)
Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4031–4039 (2017)
Sermanet, P., et al.: Time-contrastive networks: self-supervised learning from video. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J., Yang, X.: Crowd counting via adversarial cross-scale consistency pursuit. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Sheng, X., Tang, J., **ao, X., Xue, G.: Leveraging GPS-less sensing scheduling for green mobile crowd sensing. IEEE Internet Things J. 1(4), 328–336 (2014)
Shi, M., Yang, Z., Xu, C., Chen, Q.: Revisiting perspective information for efficient crowd counting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Shi, Z., Mettes, P., Snoek, C.G.M.: Counting with focus for free. In: International Conference on Computer Vision, pp. 4200–4209 (2019)
Shi, Z., et al.: Crowd counting with deep negative correlation learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5382–5390 (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid CNNs. In: International Conference on Computer Vision, pp. 1879–1888 (2017)
Wan, J., Chan, A.B.: Adaptive density map generation for crowd counting. In: International Conference on Computer Vision, pp. 1130–1139 (2019)
Wang, Q., Gao, J., Lin, W., Yuan, Y.: Learning from synthetic data for crowd counting in the wild. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Xu, D., **ao, J., Zhao, Z., Shao, J., **e, D., Zhuang, Y.: Self-supervised spatiotemporal learning via video clip order prediction. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10334–10343 (2019)
Yan, Z., et al.: Perspective-guided convolution networks for crowd counting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 833–841 (2015)
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 589–597 (2016)
Zhao, M., Zhang, J., Zhang, C., Zhang, W.: Leveraging heterogeneous auxiliary tasks to assist crowd counting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12736–12745 (2019)
Acknowledgements
This work was supported in part by the Italy-China collaboration project TALENT:2018YFE0118400, in part by National Natural Science Foundation of China: 61620106009, 61772494, 61931008, U1636214, 61836002 and 61976069, in part by Key Research Program of Frontier Sciences, CAS: QYZDJ-SSW-SYS013, in part by Youth Innovation Promotion Association CAS.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, Y., Li, G., Wu, Z., Su, L., Huang, Q., Sebe, N. (2020). Weakly-Supervised Crowd Counting Learns from Sorting Rather Than Locations. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12353. Springer, Cham. https://doi.org/10.1007/978-3-030-58598-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-58598-3_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58597-6
Online ISBN: 978-3-030-58598-3
eBook Packages: Computer ScienceComputer Science (R0)