SD-Net: Understanding overcrowded scenes in real-time via an efficient dilated convolutional neural network

Khan, Noman; Ullah, Amin; Haq, Ijaz Ul; Menon, Varun G.; Baik, Sung Wook

doi:10.1007/s11554-020-01020-8

SD-Net: Understanding overcrowded scenes in real-time via an efficient dilated convolutional neural network

Special Issue Paper
Published: 28 September 2020

Volume 18, pages 1729–1743, (2021)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Noman Khan¹,
Amin Ullah¹,
Ijaz Ul Haq¹,
Varun G. Menon² &
…
Sung Wook Baik¹

855 Accesses
33 Citations
Explore all metrics

Abstract

The advancements in computer vision-related technologies attract many researchers for surveillance applications, particularly involving the automated crowded scenes analysis such as crowd counting in a very congested scene. In crowd counting, the main goal is to count or estimate the number of people in a particular scene. Understanding overcrowded scenes in real-time is important for instant responsive actions. However, it is a very difficult task due to some of the key challenges including clutter background, occlusion, variations in human pose and scale, and limited surveillance training data, that are inadequately covered in the employed literature. To tackle these challenges, we introduce “SD-Net” an end-to-end CNN architecture, which produces real-time high quality density maps and effectively counts people in extremely overcrowded scenes. The proposed architecture consists of depthwise separable, standard, and dilated 2D convolutional layers. Depthwise separable and standard 2D convolutional layers are used to extract 2D features. Instead of using pooling layers, dilated 2D convolutional layers are employed that results in huge receptive fields and reduces the number of parameters. Our CNN architecture is evaluated using four publicly available crowd analysis datasets, demonstrating superiority over state-of-the-art in terms of accuracy and model size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Slime Mold optimization with hybrid deep learning enabled crowd-counting approach in video surveillance

Article 26 October 2023

Crowd Counting from a Still Image Using Multi-scale Fully Convolutional Network with Adaptive Human-Shaped Kernel

LCDnet: a lightweight crowd density estimation model for real-time video surveillance

Article Open access 06 March 2023

References

Li, T., et al.: Crowded scene analysis: a survey. IEEE Trans. Circuits Syst. Video Technol. 25(3), 367–386 (2014)
Article Google Scholar
Hassaballah, M., Kenk, M.A., Elhenawy, I.M.: On-road vehicles detection using appearance and texture information. Egypt. Comput. Sci. J. 43(1) (2019)
Zhang, C., et al.: Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (2015)
Zhang, C., et al.: Data-driven crowd understanding: a baseline for a large-scale crowd dataset. IEEE Trans. Multimedia 18(6), 1048–1061 (2016)
Article Google Scholar
Li, Y., Zhang, X., Chen, D.: Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition (2018)
Pan, J., et al.: Shallow and deep convolutional networks for saliency prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (2015)
Wei, Y., et al.: Stc: A simple to complex framework for weakly-supervised semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2314–2320 (2016)
Article Google Scholar
Wei, Y., et al.: Object region mining with adversarial erasing: a simple classification to semantic segmentation approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition (2017)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. ar**v preprint ar**v:1511.07122 (2015)
Chen, L.-C., et al.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Article Google Scholar
Andri, R., et al.: YodaNN: An ultra-low power convolutional neural network accelerator based on binary weights. In: 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). 2016. IEEE
Jia, Y., et al.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia (2014)
Qiu, J., et al.: Going deeper with embedded fpga platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (2016)
Zhang, X., et al.: High-performance video content recognition with long-term recurrent convolutional network for FPGA. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL). 2017. IEEE
Zhang, X., et al.: Machine learning on FPGAs to face the IoT revolution. In: 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 2017. IEEE
Loy, C.C., et al.: Crowd counting and profiling: Methodology and evaluation. Modeling, simulation and visual analysis of crowds, pp. 347–382. Springer, Berlin (2013)
Chapter Google Scholar
Dollar, P., et al.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2011)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05). 2005. IEEE.
Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vision 57(2), 137–154 (2004)
Article Google Scholar
Felzenszwalb, P.F., et al.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2009)
Article Google Scholar
Hassaballah, M., Awad, A.I.: Detection and description of image features: an introduction. Image feature detectors and descriptors, pp. 1–8. Springer, Berlin (2016)
Google Scholar
Chan, A.B., Vasconcelos, N.: Bayesian Poisson regression for crowd counting. In: 2009 IEEE 12th international conference on computer vision. 2009. IEEE.
Idrees, H., et al.: Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE conference on computer vision and pattern recognition (2013)
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision. 1999. IEEE.
Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: Advances in neural information processing systems (2010)
Pham, V.-Q., et al.: Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In: Proceedings of the IEEE International Conference on Computer Vision (2015)
Hassaballah, M., Awad, A.I.: Deep learning in computer vision: principles and applications. CRC Press, Boca Raton (2020)
Book Google Scholar
Muhammad, K., et al.: Energy-efficient monitoring of fire scenes for intelligent networks. IEEE Netw. 34(3), 108–115 (2020)
Article Google Scholar
Ullah, A., et al.: Action recognition using optimized deep autoencoder and CNN for surveillance data streams of non-stationary environments. Future Gener. Comput. Syst. 96, 386–397 (2019)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. ar**v preprint ar**v:1409.1556 (2014)
Yan, L., Zheng, Y., Cao, J.: Few-shot learning for short text classification. Multimedia Tools Appl. 77(22), 29799–29810 (2018)
Article Google Scholar
Hassaballah, M., Hosny, K.M.: Recent advances in computer vision: theories and applications, vol. 804. Springer, Berlin (2018)
Google Scholar
Ul Haq, I., et al.: Personalized movie summarization using deep cnn-assisted facial expression recognition. Complexity. 2019 (2019)
Muhammad, K., et al.: Deep learning for Multigrade Brain Tumor classification in smart healthcare systems: a prospective survey. IEEE Trans. Neural Netw. Learn. Syst. (2020)
Ullah, F.U.M., et al.: Violence detection using spatiotemporal features with 3D convolutional neural network. Sensors 19(11), 2472 (2019)
Article Google Scholar
Khan, S.U., et al.: Cover the violence: a novel deep-learning-based approach towards violence-detection in movies. Appl. Sci. 9(22), 4963 (2019)
Article Google Scholar
Walach, E., Wolf, L.: Learning to count with cnn boosting. In: European conference on computer vision. Springer, Berlin (2016)
Book Google Scholar
Shang, C., Ai, H., Bai, B.: End-to-end crowd counting via joint learning local and global count. In: 2016 IEEE International Conference on Image Processing (ICIP). 2016. IEEE
Boominathan, L., Kruthiventi, S.S., Babu, R.V.: Crowdnet: A deep convolutional network for dense crowd counting. In: Proceedings of the 24th ACM international conference on Multimedia (2016)
Marsden, M., et al.: Fully convolutional crowd counting on highly congested scenes. ar**v preprint ar**v:1612.00220 (2016)
Sindagi, V.A., Patel, V.M.: Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). 2017. IEEE.
Zhang, Y., et al.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (2016)
Onoro-Rubio, D., López-Sastre, R.J.: Towards perspective-free object counting with deep learning. In: European Conference on Computer Vision. Springer, Berlin (2016)
Book Google Scholar
Shi, X., et al.: A real-time deep network for crowd counting. ar**v preprint ar**v:2002.06515, (2020)
Wang, N., et al.: A light tracker for online multiple pedestrian tracking. J. Real-Time Image Process. 1–17
Balasundaram, A., Chellappan, C.: An intelligent video analytics model for abnormal event detection in online surveillance video. J. Real-Time Image Process. 1–16 (2018)
Shallari, I., Krug, S., O’Nils, M.: Communication and computation inter-effects in people counting using intelligence partitioning. J. Real-Time Image Process. 1–14 (2020)
Migniot, C., Ababsa, F.: Hybrid 3D–2D human tracking in a top view. J. Real-Time Image Proc. 11(4), 769–784 (2016)
Article Google Scholar
Poiesi, F., Cavallaro, A.: Predicting and recognizing human interactions in public spaces. J. Real-Time Image Proc. 10(4), 785–803 (2015)
Article Google Scholar
Nam, Y., Hong, S.: Real-time abnormal situation detection based on particle advection in crowded scenes. J. Real-Time Image Proc. 10(4), 771–784 (2015)
Article Google Scholar
Bahri, H., et al.: Real-time moving human detection using HOG and Fourier descriptor based on CUDA implementation. J. Real-Time Image Process. 1–16 (2019)
Chun, S., Lee, C.-S., Jang, J.-S.: Real-time smart lighting control using human motion tracking from depth camera. J. Real-Time Image Proc. 10(4), 805–820 (2015)
Article Google Scholar
Lotfi, M., Motamedi, S.A., Sharifian, S.: Time-based feedback-control framework for real-time video surveillance systems with utilization control. J. Real-Time Image Proc. 16(4), 1301–1316 (2019)
Article Google Scholar
Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. IEEE
Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid cnns. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
Chan, A.B., Liang, Z.-S.J., Vasconcelos, N.: Privacy preserving crowd monitoring: Counting people without people models or tracking. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition. 2008. IEEE
Sajjad, M., et al.: Multi-grade brain tumor classification using deep CNN with extensive data augmentation. J. Comput. Sci. 30, 174–182 (2019)
Article Google Scholar
Howard, A.G., et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. ar**v preprint ar**v:1704.04861 (2017)
Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (2017)
Chen, L.-C., et al.: Rethinking atrous convolution for semantic image segmentation. ar**v preprint ar**v:1706.05587 (2017)
Zeiler, M.D., et al.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on computer vision and pattern recognition. 2010. IEEE
Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision (2015)
Lu, Z., et al.: The Classification of Gliomas Based on a Pyramid Dilated Convolution ResNet Model. Pattern Recogn. Lett. (2020)
Tota, K., Idrees, H.: Counting in dense crowds using deep features. In: Proc. CRCV. (2015)

Download references

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. 2019R1A2B5B01070067).

Author information

Authors and Affiliations

Intelligent Media Laboratory, Digital Contents Research Institute, Sejong University, Seoul, Republic of Korea
Noman Khan, Amin Ullah, Ijaz Ul Haq & Sung Wook Baik
Department of Computer Science and Engineering, SCMS School of Engineering and Technology, Ernakulam, 683576, India
Varun G. Menon

Authors

Noman Khan
View author publications
You can also search for this author in PubMed Google Scholar
Amin Ullah
View author publications
You can also search for this author in PubMed Google Scholar
Ijaz Ul Haq
View author publications
You can also search for this author in PubMed Google Scholar
Varun G. Menon
View author publications
You can also search for this author in PubMed Google Scholar
Sung Wook Baik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sung Wook Baik.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khan, N., Ullah, A., Haq, I.U. et al. SD-Net: Understanding overcrowded scenes in real-time via an efficient dilated convolutional neural network. J Real-Time Image Proc 18, 1729–1743 (2021). https://doi.org/10.1007/s11554-020-01020-8

Download citation

Received: 10 May 2020
Accepted: 13 September 2020
Published: 28 September 2020
Issue Date: October 2021
DOI: https://doi.org/10.1007/s11554-020-01020-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

SD-Net: Understanding overcrowded scenes in real-time via an efficient dilated convolutional neural network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Slime Mold optimization with hybrid deep learning enabled crowd-counting approach in video surveillance

Crowd Counting from a Still Image Using Multi-scale Fully Convolutional Network with Adaptive Human-Shaped Kernel

LCDnet: a lightweight crowd density estimation model for real-time video surveillance

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

SD-Net: Understanding overcrowded scenes in real-time via an efficient dilated convolutional neural network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Slime Mold optimization with hybrid deep learning enabled crowd-counting approach in video surveillance

Crowd Counting from a Still Image Using Multi-scale Fully Convolutional Network with Adaptive Human-Shaped Kernel

LCDnet: a lightweight crowd density estimation model for real-time video surveillance

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation