Spatiotemporal Object Detection and Activity Recognition

Kumar, Vimal; Jain, Shobhit; Lillis, David

doi:10.1007/978-981-99-9651-3_6

Vimal Kumar¹⁷,
Shobhit Jain¹⁸ &
David Lillis¹⁷

Part of the book series: Big Data Management ((BIGDM))

122 Accesses

Abstract

Spatiotemporal object detection and activity recognition are essential components in the advancement of computer vision, with broad applications spanning surveillance, autonomous driving, and smart stores. This chapter offers a comprehensive overview of the techniques and applications associated with these concepts. Beginning with an introduction to the fundamental principles of object detection and activity recognition, we discuss the challenges and limitations posed by existing methods. The chapter progresses to explore spatiotemporal object detection and activity recognition, which entails capturing spatial and temporal information of moving objects in video data. A hierarchical model for spatiotemporal object detection and activity recognition is proposed, designed to maintain spatial and temporal connectivity across frames. Additionally, the chapter outlines various metrics for evaluating the performance of object detection and activity recognition models, ensuring their accuracy and effectiveness in real-world applications. Finally, we underscore the significance of spatiotemporal object detection and activity recognition in diverse fields such as surveillance, autonomous driving, and smart stores, emphasizing the potential for further research and development in these areas. In summary, this chapter provides a thorough examination of spatiotemporal object detection and activity recognition, from the foundational concepts to the latest techniques and applications. By presenting a hierarchical model and performance evaluation metrics, the chapter serves as a valuable resource for researchers and practitioners seeking to harness the power of computer vision in a variety of domains.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Pei, T., Huang, Q., Wang, X., Chen, X., Liu, Y., Song, C., … Zhou, C. (2021). Big geodata aggregation: Connotation, classification, and framework. National Remote Sensing Bulletin, 25(11), 2153–2162. doi: https://doi.org/10.11834/jrs.20210480
Liu, Y., & **g, H. (2022). A Sports Video Behavior Recognition Using Local Spatiotemporal Patterns. Mobile Information Systems, 2022. doi: https://doi.org/10.1155/2022/4805993
Wang, K., Li, X., Yang, J., Wu, J., & Li, R. (2021). Temporal action detection based on two-stream You Only Look Once network for elderly care service robot. International Journal of Advanced Robotic Systems, 18(4). doi: https://doi.org/10.1177/17298814211038342
Clapham, M., Miller, E., Nguyen, M., & Darimont, C. T. (2020). Automated facial recognition for wildlife that lack unique markings: A deep learning approach for brown bears. Ecology and Evolution, 10(23), 12883–12892. doi: https://doi.org/10.1002/ece3.6840
Akilan, T. (2018). Video foreground localization from traditional methods to deep learning (Doctoral dissertation, University of Windsor (Canada)).
Google Scholar
Ramesh, S., Dall’Alba, D., Gonzalez, C., Yu, T., Mascagni, P., Mutter, D., Padoy, N. (2023). TRandAugment: temporal random augmentation strategy for surgical activity recognition from videos. International Journal of Computer Assisted Radiology and Surgery. doi: https://doi.org/10.1007/s11548-023-02864-8
Cardoso, D. B., Campos, L. C. B., & Nascimento, E. R. (2022). An Action Recognition Approach with Context and Multiscale Motion Awareness. In Proceedings - 2022 35th Conference on Graphics, Patterns, and Images, SIBGRAPI 2022 (pp. 73–78). Institute of Electrical and Electronics Engineers Inc. doi: https://doi.org/10.1109/SIBGRAPI55357.2022.9991807
SankaranNampoothiri, S., & Anoop BK (2014). Review on Vision based Human Activity Analysis. International Journal of Computer Applications, 99(2), 9–14. doi: https://doi.org/10.5120/17343-6240
Aakur, S., Sawyer, D., Balazia, M., & Sarkar, S. (2020). An examination of proposal-based approaches to fine-grained activity detection in untrimmed surveillance videos. In 2018 TREC Video Retrieval Evaluation, TRECVID 2018. National Institute of Standards and Technology (NIST).
Google Scholar
Sun, J., Wu, X., Yan, S., Cheong, L. F., Chua, T. S., & Li, J. (2009). Hierarchical spatio-temporal context modeling for action recognition. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009 (pp. 2004–2011). IEEE Computer Society. doi: https://doi.org/10.1109/CVPRW.2009.5206721
Wang, J., Chen, Z., & Wu, Y. (2011, June). Action recognition with multiscale spatio-temporal contexts. In CVPR 2011 (pp. 3185-3192). IEEE.
Google Scholar
Ahsan, U., Madhok, R., & Essa, I. (2019, January). Video jigsaw: Unsupervised learning of spatiotemporal context for video action recognition. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 179-189). IEEE.
Google Scholar
Liu, L., Shao, L., Li, X., & Lu, K. (2016). Learning spatio-temporal representations for action recognition: A genetic programming approach. IEEE Transactions on Cybernetics, 46(1), 158–170. doi: https://doi.org/10.1109/TCYB.2015.2399172.
Haroon Idrees, Khurram Soomro and Mubarak Shah, Detecting Humans in Dense Crowds using Locally-Consistent Scale Prior and Global Occlusion Reasoning, Pattern Analysis and Machine Intelligence (PAMI), IEEE Transactions, 2015.
Yu, T. W., Sarwar, M. A., Daraghmi, Y. A., Cheng, S. H., Ik, T. U., & Li, Y. L. (2022). Spatiotemporal Activity Semantics Understanding Based on Foreground Object Segmentation: iCounter Scenario. IEEE Access, 10, 57748–57758. doi: https://doi.org/10.1109/ACCESS.2022.3178609
Article Google Scholar
**a, L., & Aggarwal, J. K. (2013). Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2834-2841).
Google Scholar
Song, S., Lan, C., ** in Classifying Proportional Data," in IEEE Access, vol. 9, pp. 3712-3724, 2021, doi: https://doi.org/10.1109/ACCESS.2020.3047536.
Liz Oz, Always AI (2022), 17 interesting applications of Object Detection for businesses https://alwaysai.co/blog/object-detection-for-businesses June 4, 12.30PM PST
Abdellah Chehri, Hussein T. Mouftah, Autonomous vehicles in the sustainable cities, the beginning of a green adventure, Sustainable Cities and Society, Vol 51, 2019, 101751, ISSN 2210-6707, doi: https://doi.org/10.1016/j.scs.2019.101751.
Torrens PM. Smart and Sentient Retail High Streets. Smart Cities. 2022; 5(4):1670-1720. doi: https://doi.org/10.3390/smartcities5040085.
Lee J, Ahn B. Real-Time Human Action Recognition with a Low-Cost RGB Camera and Mobile Robot Platform. Sensors. 2020; 20(10):2886. doi: https://doi.org/10.3390/s20102886

Download references

Author information

Authors and Affiliations

University College Dublin, Dublin, Ireland
Vimal Kumar & David Lillis
The University of Texas at Dallas, Richardson, TX, USA
Shobhit Jain

Authors

Vimal Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Shobhit Jain
View author publications
You can also search for this author in PubMed Google Scholar
David Lillis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vimal Kumar .

Editor information

Editors and Affiliations

Computing Science and Information Engineering, National Chung Cheng University, Chiayi County, Taiwan
John A
Computer Science and Engineering, Amity University, Mumbai, Maharashtra, India
Satheesh Abimannan
College of Computing and Mathematics, King Fahd University of Petroleum and Mi, Dhahran, Saudi Arabia
El-Sayed M. El-Alfy
Computing Science and Information Engineering, National Taipei University, New Taipei City, Taiwan
Yue-Shan Chang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kumar, V., Jain, S., Lillis, D. (2024). Spatiotemporal Object Detection and Activity Recognition. In: A, J., Abimannan, S., El-Alfy, ES.M., Chang, YS. (eds) Spatiotemporal Data Analytics and Modeling. Big Data Management. Springer, Singapore. https://doi.org/10.1007/978-981-99-9651-3_6

Download citation

DOI: https://doi.org/10.1007/978-981-99-9651-3_6
Published: 16 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-9650-6
Online ISBN: 978-981-99-9651-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics