Abstract
In recent years, tracking-by-detection (TBD) has emerged as the predominant approach for Multi-object Tracking (MOT). Most TBD algorithms typically employ separate branch heads to handle the coarse feature representations extracted from the backbone network. However, this approach often leads to suboptimal model performance. This is due to the presence of feature conflicts among multiple tasks on one hand, and the lack of information interaction among these tasks on the other. Inspired by CSTrack and Unihead, we have combined the feature decoupling module REN with the transformer-based task interaction module CIT to form CrossUnihead which is a plug and play module. This module not only effectively achieves task-based feature decoupling, but also facilitates information interaction among different tasks. The MOT algorithm based on CrossUnihead, named UniTracker, demonstrates impressive performance on the MOT data-set when compared to other advanced methods and is capable of achieving real-time tracking.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11554-024-01514-9/MediaObjects/11554_2024_1514_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11554-024-01514-9/MediaObjects/11554_2024_1514_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11554-024-01514-9/MediaObjects/11554_2024_1514_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11554-024-01514-9/MediaObjects/11554_2024_1514_Fig4_HTML.png)
Data availability
No datasets were generated or analyzed during the current study.
References
Aharon, N., Orfaig, R., Bobrovsky, B.Z.: Bot-sort: Robust associations multi-pedestrian tracking. ar**v preprint (2022) ar**v:2206.14651
Bae, S.H., Yoon, K.J.: Confidence-based data association and discriminative deep appearance learning for robust online multi-object tracking. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 595–610 (2017)
Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple online and realtime tracking. In: 2016 IEEE international conference on image processing (ICIP), pp. 3464–3468. IEEE (2016)
Cao, J., Pang, J., Weng, X., Khirodkar, R., Kitani, K.: Observation-centric sort: Rethinking sort for robust multi-object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9686–9696 (2023)
Chen, L., Ai, H., Shang, C., Zhuang, Z., Bai, B.: Online multi-object tracking with convolutional neural networks. In: 2017 IEEE international conference on image processing (ICIP), pp. 645–649. IEEE (2017)
Chen, S., Sun, P., Song, Y., Luo, P.: Diffusiondet: Diffusion model for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 19830–19843 (2023)
Chen, X., Iranmanesh, S.M., Lien, K.C.: Patchtrack: Multiple object tracking using frame patches. ar**v preprint (2022) ar**v:2201.00080
Du, Y., Zhao, Z., Song, Y., Zhao, Y., Su, F., Gong, T., Meng, H.: Strongsort: Make deepsort great again. IEEE Trans. Multimed. 25, 8725–8737 (2023). https://doi.org/10.1109/TMM.2023.3240881
Duan, K., Bai, S., **e, L., Qi, H., Huang, Q., Tian, Q.: Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 6569–6578 (2019)
Fang, K., **ang, Y., Li, X., Savarese, S.: Recurrent autoregressive networks for online multi-object tracking. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 466–475. IEEE (2018)
Fast, P.: R-cnn. In: Digital TV and Wireless Multimedia Communication: 14th International Forum, IFTC 2017, Shanghai, China, November 8–9, 2017, Revised Selected Papers, vol. 815, p. 172. Springer (2018)
Feng, C., Zhong, Y., Gao, Y., Scott, M.R., Huang, W.: Tood: Task-aligned one-stage object detection. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3490–3499. IEEE Computer Society (2021)
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448 (2015)
Han, S., Huang, P., Wang, H., Yu, E., Liu, D., Pan, X.: Mat: Motion-aware multi-object tracking. Neurocomputing 476, 75–86 (2022)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 2961–2969 (2017)
Law, H., Deng, J.: Cornernet: Detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp. 734–750 (2018)
Liang, C., Zhang, Z., Zhou, X., Li, B., Zhu, S., Hu, W.: Rethinking the competition between detection and reid in multiobject tracking. IEEE Trans. Image Process. 31, 3182–3196 (2022)
Luo, R., Song, Z., Ma, L., Wei, J., Yang, W., Yang, M.: Diffusiontrack: Diffusion model for multi-object tracking. Proc. AAAI Conf. Artif. Intell. 38, 3991–3999 (2024)
Maggiolino, G., Ahmad, A., Cao, J., Kitani, K.: Deep oc-sort: Multi-pedestrian tracking by adaptive re-identification. ar**v preprint (2023) ar**v:2302.11813
Meinhardt, T., Kirillov, A., Leal-Taixe, L., Feichtenhofer, C.: Trackformer: Multi-object tracking with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8844–8854 (2022)
Pang, B., Li, Y., Zhang, Y., Li, M., Lu, C.: Tubetk: Adopting tubes to track multi-object in a one-step training model. pp. 6308–6318 (2020)
Pang, J., Qiu, L., Li, X., Chen, H., Li, Q., Darrell, T., Yu, F.: Quasi-dense similarity learning for multiple object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 164–173 (2021)
Peng, J., Wang, C., Wan, F., Wu, Y., Wang, Y., Tai, Y., Wang, C., Li, J., Huang, F., Fu, Y.: Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16, pp. 145–161. Springer (2020)
Qin, Z., Zhou, S., Wang, L., Duan, J., Hua, G., Tang, W.: Motiontrack: Learning robust short-term and long-term motions for multi-object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 17939–17948 (2023)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788 (2016)
Ren, H., Han, S., Ding, H., Zhang, Z., Wang, H., Wang, F.: Focus on details: Online multi-object tracking with diverse fine-grained representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11289–11298 (2023)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(06), 1137–1149 (2017)
Ren, W., Wu, D., Cao, H., Chen, B., Shi, Y., Jiang, W., Liu, H.: Countingmot: Joint counting, detection and re-identification for multiple object tracking. ar**v preprint (2022) ar**v:2212.05861
Sanchez-Matilla, R., Poiesi, F., Cavallaro, A.: Online multi-target tracking with strong and weak detections. In: Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8–10 and 15–16, 2016, Proceedings, Part II 14, pp. 84–99. Springer (2016)
Schwarz, W., Miller, J.: Gsdt: An integrative model of visual search. J. Exp. Psychol. Human Percept. Perform. 42(10), 1654 (2016)
Seidenschwarz, J., Brasó, G., Serrano, V.C., Elezi, I., Leal-Taixé, L.: Simple cues lead to a strong multi-object tracker. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13813–13823 (2023)
Song, G., Liu, Y., Wang, X.: Revisiting the sibling head in object detector. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11563–11572 (2020)
Stadler, D., Beyerer, J.: Past information aggregation for multi-person tracking. In: 2023 IEEE International Conference on Image Processing (ICIP), pp. 321–325. IEEE (2023)
Sun, P., Cao, J., Jiang, Y., Zhang, R., **e, E., Yuan, Z., Wang, C., Luo, P.: Transtrack: Multiple object tracking with transformer. ar**v preprint (2020) ar**v:2012.15460
Sun, S., Akhtar, N., Song, H., Mian, A., Shah, M.: Deep affinity network for multiple object tracking. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 104–119 (2019)
Tian, Z., Shen, C., Chen, H., He, T.: Fcos: Fully convolutional one-stage object detection. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9626–9635 (2019).
Wang, G., Chen, Y., An, P., Hong, H., Hu, J., Huang, T.: Uav-yolov8: a small-object-detection model based on improved yolov8 for uav aerial photography scenarios. Sensors 23(16), 7190 (2023)
Wang, J., Peng, Y., Yang, X., Wang, T., Zhang, Y.: Sportstrack: An innovative method for tracking athletes in sports scenes. ar**v preprint (2022) ar**v:2211.07173
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE international conference on image processing (ICIP), pp. 3645–3649. IEEE (2017)
Wu, J., Cao, J., Song, L., Wang, Y., Yang, M., Yuan, J.: Track to detect and segment: An online multi-object tracker. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12352–12361 (2021)
**ang, Y., Alahi, A., Savarese, S.: Learning to track: Online multi-object tracking by decision making. In: Proceedings of the IEEE international conference on computer vision, pp. 4705–4713 (2015)
Xu, Y., Ban, Y., Delorme, G., Gan, C., Rus, D., Alameda-Pineda, X.: Transcenter: transformers with dense representations for multiple-object tracking. IEEE Trans. Pattern Anal. Mach. Intell. 45(6), 7820–7835 (2022)
Yang, F., Odashima, S., Masui, S., Jiang, S.: Hard to track objects with irregular motions and similar appearances? make it easier by buffering the matching space. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 4799–4808 (2023)
You, S., Yao, H., Bao, B.k., Xu, C.: Utm: A unified multiple object tracking model with identity-aware feature enhancement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21876–21886 (2023)
Yu, E., Li, Z., Han, S., Wang, H.: Relationtrack: Relation-aware multiple object tracking with decoupled representation. IEEE Trans. Multimed. 25, 2686–2697 (2023). https://doi.org/10.1109/TMM.2022.3150169
Yu, F., Li, W., Li, Q., Liu, Y., Shi, X., Yan, J.: Poi: Multiple object tracking with high performance detection and appearance feature. In: Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part II 14, pp. 36–42. Springer (2016)
Zeng, F., Dong, B., Zhang, Y., Wang, T., Zhang, X., Wei, Y.: Motr: End-to-end multiple-object tracking with transformer. In: European Conference on Computer Vision, pp. 659–675. Springer (2022)
Zhang, Y., Jia, Y., **e, H., Li, M., Zhao, L., Yang, Y., Zhao, S.: Rt-track: robust tricks for multi-pedestrian tracking. ar**v preprint (2023) ar**v:2303.09668
Zhang, Y., Sun, P., Jiang, Y., Yu, D., Weng, F., Yuan, Z., Luo, P., Liu, W., Wang, X.: Bytetrack: Multi-object tracking by associating every detection box. In: European Conference on Computer Vision, pp. 1–21. Springer (2022)
Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: Fairmot: On the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vis. 129, 3069–3087 (2021)
Zhao, L., Li, S.: Object detection algorithm based on improved yolov3. Electronics 9(3), 537 (2020)
Zhou, H., Yang, R., Zhang, Y., Duan, H., Huang, Y., Hu, R., Li, X., Zheng, Y.: Unihead: unifying multi-perception for detection heads. ar**v preprint (2023) ar**v:2309.13242
Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. In: European conference on computer vision, pp. 474–490. Springer (2020)
Zhou, Z., **ng, J., Zhang, M., Hu, W.: Online multi-target tracking with tensor-based high-order graph matching. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 1809–1814. IEEE (2018)
Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 840–849 (2019)
Acknowledgements
This work is supported in part by the Natural Science Foundation of Jiangsu Province (Grant No. BK20151102, BK20201267), in part by the State Key Laboratory for Novel Software Technology, Nan**g University (Grant No. KFKT2017B17), and in part by the Natural Science Foundation of China (Grant No. 61673108).
Author information
Authors and Affiliations
Contributions
F.W. is mainly responsible for experimental simulation and paper writing. Y.Z. is mainly responsible for paper review and work guidance.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wu, F., Zhang, Y. UniTracker: transformer-based CrossUnihead for multi-object tracking. J Real-Time Image Proc 21, 133 (2024). https://doi.org/10.1007/s11554-024-01514-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11554-024-01514-9