Abstract
Spatiotemporal object detection and activity recognition are essential components in the advancement of computer vision, with broad applications spanning surveillance, autonomous driving, and smart stores. This chapter offers a comprehensive overview of the techniques and applications associated with these concepts. Beginning with an introduction to the fundamental principles of object detection and activity recognition, we discuss the challenges and limitations posed by existing methods. The chapter progresses to explore spatiotemporal object detection and activity recognition, which entails capturing spatial and temporal information of moving objects in video data. A hierarchical model for spatiotemporal object detection and activity recognition is proposed, designed to maintain spatial and temporal connectivity across frames. Additionally, the chapter outlines various metrics for evaluating the performance of object detection and activity recognition models, ensuring their accuracy and effectiveness in real-world applications. Finally, we underscore the significance of spatiotemporal object detection and activity recognition in diverse fields such as surveillance, autonomous driving, and smart stores, emphasizing the potential for further research and development in these areas. In summary, this chapter provides a thorough examination of spatiotemporal object detection and activity recognition, from the foundational concepts to the latest techniques and applications. By presenting a hierarchical model and performance evaluation metrics, the chapter serves as a valuable resource for researchers and practitioners seeking to harness the power of computer vision in a variety of domains.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Pei, T., Huang, Q., Wang, X., Chen, X., Liu, Y., Song, C., … Zhou, C. (2021). Big geodata aggregation: Connotation, classification, and framework. National Remote Sensing Bulletin, 25(11), 2153–2162. doi: https://doi.org/10.11834/jrs.20210480
Liu, Y., & **g, H. (2022). A Sports Video Behavior Recognition Using Local Spatiotemporal Patterns. Mobile Information Systems, 2022. doi: https://doi.org/10.1155/2022/4805993
Wang, K., Li, X., Yang, J., Wu, J., & Li, R. (2021). Temporal action detection based on two-stream You Only Look Once network for elderly care service robot. International Journal of Advanced Robotic Systems, 18(4). doi: https://doi.org/10.1177/17298814211038342
Clapham, M., Miller, E., Nguyen, M., & Darimont, C. T. (2020). Automated facial recognition for wildlife that lack unique markings: A deep learning approach for brown bears. Ecology and Evolution, 10(23), 12883–12892. doi: https://doi.org/10.1002/ece3.6840
Akilan, T. (2018). Video foreground localization from traditional methods to deep learning (Doctoral dissertation, University of Windsor (Canada)).
Ramesh, S., Dall’Alba, D., Gonzalez, C., Yu, T., Mascagni, P., Mutter, D., Padoy, N. (2023). TRandAugment: temporal random augmentation strategy for surgical activity recognition from videos. International Journal of Computer Assisted Radiology and Surgery. doi: https://doi.org/10.1007/s11548-023-02864-8
Cardoso, D. B., Campos, L. C. B., & Nascimento, E. R. (2022). An Action Recognition Approach with Context and Multiscale Motion Awareness. In Proceedings - 2022 35th Conference on Graphics, Patterns, and Images, SIBGRAPI 2022 (pp. 73–78). Institute of Electrical and Electronics Engineers Inc. doi: https://doi.org/10.1109/SIBGRAPI55357.2022.9991807
SankaranNampoothiri, S., & Anoop BK (2014). Review on Vision based Human Activity Analysis. International Journal of Computer Applications, 99(2), 9–14. doi: https://doi.org/10.5120/17343-6240
Aakur, S., Sawyer, D., Balazia, M., & Sarkar, S. (2020). An examination of proposal-based approaches to fine-grained activity detection in untrimmed surveillance videos. In 2018 TREC Video Retrieval Evaluation, TRECVID 2018. National Institute of Standards and Technology (NIST).
Sun, J., Wu, X., Yan, S., Cheong, L. F., Chua, T. S., & Li, J. (2009). Hierarchical spatio-temporal context modeling for action recognition. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009 (pp. 2004–2011). IEEE Computer Society. doi: https://doi.org/10.1109/CVPRW.2009.5206721
Wang, J., Chen, Z., & Wu, Y. (2011, June). Action recognition with multiscale spatio-temporal contexts. In CVPR 2011 (pp. 3185-3192). IEEE.
Ahsan, U., Madhok, R., & Essa, I. (2019, January). Video jigsaw: Unsupervised learning of spatiotemporal context for video action recognition. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 179-189). IEEE.
Liu, L., Shao, L., Li, X., & Lu, K. (2016). Learning spatio-temporal representations for action recognition: A genetic programming approach. IEEE Transactions on Cybernetics, 46(1), 158–170. doi: https://doi.org/10.1109/TCYB.2015.2399172.
Haroon Idrees, Khurram Soomro and Mubarak Shah, Detecting Humans in Dense Crowds using Locally-Consistent Scale Prior and Global Occlusion Reasoning, Pattern Analysis and Machine Intelligence (PAMI), IEEE Transactions, 2015.
Yu, T. W., Sarwar, M. A., Daraghmi, Y. A., Cheng, S. H., Ik, T. U., & Li, Y. L. (2022). Spatiotemporal Activity Semantics Understanding Based on Foreground Object Segmentation: iCounter Scenario. IEEE Access, 10, 57748–57758. doi: https://doi.org/10.1109/ACCESS.2022.3178609
**a, L., & Aggarwal, J. K. (2013). Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2834-2841).
Song, S., Lan, C., ** in Classifying Proportional Data," in IEEE Access, vol. 9, pp. 3712-3724, 2021, doi: https://doi.org/10.1109/ACCESS.2020.3047536.
Liz Oz, Always AI (2022), 17 interesting applications of Object Detection for businesses https://alwaysai.co/blog/object-detection-for-businesses June 4, 12.30PM PST
Abdellah Chehri, Hussein T. Mouftah, Autonomous vehicles in the sustainable cities, the beginning of a green adventure, Sustainable Cities and Society, Vol 51, 2019, 101751, ISSN 2210-6707, doi: https://doi.org/10.1016/j.scs.2019.101751.
Torrens PM. Smart and Sentient Retail High Streets. Smart Cities. 2022; 5(4):1670-1720. doi: https://doi.org/10.3390/smartcities5040085.
Lee J, Ahn B. Real-Time Human Action Recognition with a Low-Cost RGB Camera and Mobile Robot Platform. Sensors. 2020; 20(10):2886. doi: https://doi.org/10.3390/s20102886
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Kumar, V., Jain, S., Lillis, D. (2024). Spatiotemporal Object Detection and Activity Recognition. In: A, J., Abimannan, S., El-Alfy, ES.M., Chang, YS. (eds) Spatiotemporal Data Analytics and Modeling. Big Data Management. Springer, Singapore. https://doi.org/10.1007/978-981-99-9651-3_6
Download citation
DOI: https://doi.org/10.1007/978-981-99-9651-3_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-9650-6
Online ISBN: 978-981-99-9651-3
eBook Packages: Computer ScienceComputer Science (R0)