Towards Frame Rate Agnostic Multi-object Tracking

Feng, Weitao; Bai, Lei; Yao, Yongqiang; Yu, Fengwei; Ouyang, Wanli

doi:10.1007/s11263-023-01943-2

Towards Frame Rate Agnostic Multi-object Tracking

Published: 20 November 2023

Volume 132, pages 1443–1462, (2024)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Weitao Feng¹,
Lei Bai ORCID: orcid.org/0000-0003-3378-7201²,
Yongqiang Yao³,
Fengwei Yu³ &
…
Wanli Ouyang^1,2

395 Accesses
1 Citation
3 Altmetric
Explore all metrics

Abstract

Multi-object Tracking (MOT) is one of the most fundamental computer vision tasks that contributes to various video analysis applications. Despite the recent promising progress, current MOT research is still limited to a fixed sampling frame rate of the input stream. They are neither as flexible as humans nor well-matched to industrial scenarios which require the trackers to be frame rate insensitive in complicated conditions. In fact, we empirically found that the accuracy of all recent state-of-the-art trackers drops dramatically when the input frame rate changes. For a more intelligent tracking solution, we shift the attention of our research work to the problem of Frame Rate Agnostic MOT (FraMOT), which takes frame rate insensitivity into consideration. In this paper, we propose a Frame Rate Agnostic MOT framework with a Periodic training Scheme (FAPS) to tackle the FraMOT problem for the first time. Specifically, we propose a Frame Rate Agnostic Association Module (FAAM) that infers and encodes the frame rate information to aid identity matching across multi-frame-rate inputs, improving the capability of the learned model in handling complex motion-appearance relations in FraMOT. Moreover, the association gap between training and inference is enlarged in FraMOT because those post-processing steps not included in training make a larger difference in lower frame rate scenarios. To address it, we propose Periodic Training Scheme to reflect all post-processing steps in training via tracking pattern matching and fusion. Along with the proposed approaches, we make the first attempt to establish an evaluation method for this new task of FraMOT. Besides providing simulations and evaluation metrics, we try to solve new challenges in two different modes, i.e., known frame rate and unknown frame rate, aiming to handle a more complex situation. The quantitative experiments on the challenging MOT17/20 dataset (FraMOT version) have clearly demonstrated that the proposed approaches can handle different frame rates better and thus improve the robustness against complicated scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cascaded-Scoring Tracklet Matching for Multi-object Tracking

MOTR: End-to-End Multiple-Object Tracking with Transformer

Adaptive Kalman Filter with power transformation for online multi-object tracking

Article 23 January 2023

Data Availibility

All data used in this paper are publicly available on corresponding websites. MOT17/MOT20: motchallenge.net; CrowdHuman: www.crowdhuman.org; CityScapes: www.cityscapes-dataset.com; HIE: humaninevents.org. SOMPT22: sompt22.github.io

References

Bernardin, K., & Stiefelhagen, R. (2008). Evaluating multiple object tracking performance: The clear mot metrics. Journal on Image and Video Processing, 2008, 1.
Article Google Scholar
Brasó, G., & Leal-Taixé, L. (2020). Learning a neural solver for multiple object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 6247–6257).
Chu, P., & Ling, H. (2019). Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In The IEEE International Conference on Computer Vision (ICCV).
Chu, Q., Ouyang, W., Li, H., Wang, X., Liu, B., & Yu, N. (2017). Online multi-object tracking using cnn-based single object tracker with spatial-temporal attention mechanism. In ICCV.
Cordts, M., Omran, M., Ramos, S., Rehfeld, T. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Dendorfer, P., Rezatofighi, H., Milan, A., Shi, J., Cremers, D., Reid,I., Roth, S., Schindler, K., & Leal-Taixe, L. (2020). Mot20: A benchmark for multi object tracking in crowded scenes. ar**v:2003.09003
Dendorfer, P., Osep, A., Milan, A., Schindler, K., Cremers, D., Reid, I., Roth, S., & Leal-Taixe, L. (2021). Motchallenge: A benchmark for single-camera multiple target tracking. International Journal of Computer Vision, 129(4), 845–881.
Article Google Scholar
Ge, Z., Liu, S., Wang, F., Li, Z., & Sun, J. (2021). YOLOX: Exceeding YOLO series in 2021. https://arxiv.org/abs/2107.08430.
Han, T., Bai, L., Gao, J., Wang, Q., & Ouyang, W. (2022). Dr. vic: Decomposition and reasoning for video individual counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 3083–3092).
Hornakova, A., Henschel, R., Rosenhahn, B., & Swoboda, P. (2020). Lifted disjoint paths with application in multiple object tracking. In International conference on machine learning, PMLR (pp. 4364–4375).
Hu, W., Shi, X., Zhou, Z., & **ng, J. (2020). Dual l1-normalized context aware tensor power iteration and its applications to multi-object tracking and multi-graph matching. International Journal of Computer Vision, 128(2), 360–392.
Article MathSciNet Google Scholar
Kieritz, H., Hubner, W., & Arens, M. (2018). Joint detection and online multi-object tracking. In CVPRW.
Li, J., Gao, X., & Jiang, T. (2020). Graph networks for multiple object tracking. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 719–728).
Lin, W., Liu, H., Liu, S., Li, Y., Qian, R., Wang, T., Xu, N., **ong, H., Qi, G.-J. & Sebe, N. (2020). Human in events: A large-scale benchmark for human-centric video analysis in complex events. ar**v:2005.04490
Luiten, J., Osep, A., Dendorfer, P., Torr, P., Geiger, A., Leal-Taixe, L., & Leibe, B. (2021). Hota: A higher order metric for evaluating multi-object tracking. International Journal of Computer Vision, 129(2), 548–578.
Article Google Scholar
Ma, C., Yang, F., Li, Y., Jia, H., **e, X., & Gao, W. (2021). Deep trajectory post-processing and position projection for single & multiple camera multiple object tracking. International Journal of Computer Vision, 129(12), 3255–3278.
Article Google Scholar
Maksai, A., & Fua, P. (2019). Eliminating exposure bias and metric mismatch in multiple object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4639–4648).
Milan, A., Leal-Taixé, L., & Reid, I., Roth, S., & Schindler, K. (2016). MOT16: A benchmark for multi-object tracking. ar**v:1603.00831
Milan, A., Rezatofighi, S. H., Dick, A. R., Reid,. I., & Schindler, K. (2017). Online multi-target tracking using recurrent neural networks. In AAAI.
Ristani, E., Solera, F., Zou, R., Cucchiara, R., & Tomasi, C. (2016). Performance measures and a data set for multi-target, multi-camera tracking. In European Conference on Computer Vision (pp. 17–35). Springer.
Sadeghian, A., Alahi, A., & Savarese, S. (2017). Tracking the untrackable: Learning to track multiple cues with long-term dependencies. In ICCV.
Saleh, F., Aliakbarian, S., Rezatofighi, H., & Salzmann, M. (2021). Probabilistic tracklet scoring and inpainting for multiple object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 14329–14339).
Shao, S., Zhao, Z., Li, B., **ao, T., Yu, G., Zhang, X., & Sun, J. (2018). Crowdhuman: A benchmark for detecting human in a crowd. ar**v:1805.00123
Simsek, F. E., Cigla, C., & Kayabol, K. (2023). Sompt22: A surveillance oriented multi-pedestrian tracking dataset. In Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part V (pp. 659–675). Springer.
Takala, V., & Pietikainen, M. (2007). Multi-object tracking using color, texture and motion. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, IEEE (pp. 1–7).
Tang, S., Andres, B., Andriluka, M., & Schiele, B. (2016). Multi-person tracking by multicut and deep matching. In ECCV.
Tang, S., Andriluka, M., Andres, B., & Schiele, B. (2017). Multiple people tracking by lifted multicut and person reidentification. In CVPR.
Wang, Y., Kitani, K., & Weng, X. (2021). Joint object detection and multi-object tracking with graph neural networks. In 2021 IEEE International Conference on Robotics and Automation (ICRA), IEEE (pp. 13708–13715).
Wen, L., Lei, Z., Chang, M. C., Qi, H., & Lyu, S. (2017). Multi-camera multi-target tracking with space-time-view hyper-graph. International Journal of Computer Vision, 122(2), 313–333.
Article MathSciNet Google Scholar
Wojke, N., Bewley, A., & Paulus, D. (2017). Simple online and realtime tracking with a deep association metric. In ICIP.
Wu, J., Cao, J., & Song, L., Wang, Y., Yang, M., & Yuan, J. (2021). Track to detect and segment: An online multi-object tracker. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12352–12361).
Xu, J., Cao, Y., Zhang, Z., & Hu, H. (2019). Spatial-temporal relation networks for multi-object tracking. In ICCV.
Yoon, J. H., Lee, C. R., Yang, M. H., & Yoon, K.-J. (2019). Structural constraint data association for online multi-object tracking. International Journal of Computer Vision, 127(1), 1–21.
Article Google Scholar
Yu, F., Li, W., Li, Q., Liu, Y, Shi, X., & Yan, J. (2016). Poi: Multiple object tracking with high performance detection and appearance feature. In ECCV.
Zhang, L., Li, Y., & Nevatia, R. (2008). Global data association for multi-object tracking using network flows. In CVPr.
Zhang, Y., Wang, C., Wang, X., Zend, W., & Liu, W. (2021). Fairmot: On the fairness of detection and re-identification in multiple object tracking. International Journal of Computer Vision, 129(11), 3069–3087.
Article Google Scholar
Zhang, Y., Sun, P., Jiang, Y., Yu, D., Weng, F., Yuan, Z., Luo, P., Liu, W., & Wang, X. (2022). Bytetrack: Multi-object tracking by associating every detection box. In Proceedings of the European Conference on Computer Vision (ECCV).
Zhou, X., Koltun, V., & Krähenbühl, P. (2020). Tracking objects as points. ECCV.

Download references

Acknowledgements

This work is partially supported by the National Key R &D Program of China (NO.2022ZD0160100). The Research is supported by Shanghai AI Laboratory. Wanli Ouyang was supported by the Australian Research Council Grant DP200103223, Australian Medical Research Future Fund MRFAI000085, CRC-P Smart Material Recovery Facility (SMRF) - Curby Soft Plastics, and CRC-P ARIA - Bionic Visual-Spatial Prosthesis for the Blind.

Author information

Authors and Affiliations

The University of Sydney, Sydney, NSW, 2006, Australia
Weitao Feng & Wanli Ouyang
Shanghai AI Laboratory, Shanghai, 200232, China
Lei Bai & Wanli Ouyang
SenseTime Research, Shanghai, China
Yongqiang Yao & Fengwei Yu

Authors

Weitao Feng
View author publications
You can also search for this author in PubMed Google Scholar
Lei Bai
View author publications
You can also search for this author in PubMed Google Scholar
Yongqiang Yao
View author publications
You can also search for this author in PubMed Google Scholar
Fengwei Yu
View author publications
You can also search for this author in PubMed Google Scholar
Wanli Ouyang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Bai.

Additional information

Communicated by Matej Kristan.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Tacking Results Demo

Figures 12 and 13 show some selected tracking results of our approach on the MOT20 testing set. Different colors and different numbers represent different trajectories. We can see that at the lower frame rate scenarios (i.e., sampling factor \(k\ge 25\)), our method keeps a stable performance. More importantly, the results are all from the same model checkpoint, showing that our method can handle various frame rates robustly. Among these scenarios, the movements of the targets between adjacent frames are much larger than those in normal frame rate scenarios.

In Fig. 13, the targets of number 2, number 6 and number 19 do not have a large bounding box overlap between adjacent frames, leading to less reliable motion cues. In normal frame rate scenarios, such a large motion gap in adjacent frames usually indicates a different identity, which is quite different from lower frame rate scenarios. Thanks to the FAAM design and PTS strategy, Our method is able to make a correct prediction in multi-frame-rate settings simultaneously.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Feng, W., Bai, L., Yao, Y. et al. Towards Frame Rate Agnostic Multi-object Tracking. Int J Comput Vis 132, 1443–1462 (2024). https://doi.org/10.1007/s11263-023-01943-2

Download citation

Received: 09 August 2022
Accepted: 27 October 2023
Published: 20 November 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s11263-023-01943-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards Frame Rate Agnostic Multi-object Tracking

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Cascaded-Scoring Tracklet Matching for Multi-object Tracking

MOTR: End-to-End Multiple-Object Tracking with Transformer

Adaptive Kalman Filter with power transformation for online multi-object tracking

Data Availibility

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A: Tacking Results Demo

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Towards Frame Rate Agnostic Multi-object Tracking

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Cascaded-Scoring Tracklet Matching for Multi-object Tracking

MOTR: End-to-End Multiple-Object Tracking with Transformer

Adaptive Kalman Filter with power transformation for online multi-object tracking

Data Availibility

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A: Tacking Results Demo

Appendix A: Tacking Results Demo

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation