Abstract
Object tracking has emerged as an essential process for various applications in the field of computer vision, such as autonomous driving. Recently, object tracking technology has experienced rapid growth, particularly its applications in self-driving vehicles. Tracking systems typically follow the detection-based tracking paradigm, which is affected by the detection results. Although deep learning has led to significant improvements in object detection, data association remains dependent on factors such as spatial location, motion, and appearance, to associate new observations with existing tracks. In this study, we introduce a novel approach called Combined Appearance-Motion Tracking (CAMTrack) to enhance data association by integrating object appearances and their corresponding movements. The proposed tracking method utilizes an appearance-motion model using an appearance-affinity network and an Interactive Multiple Model (IMM). We deploy the appearance model to address the visual affinity between objects across frames and employed the motion model to incorporate motion constraints to obtain robust position predictions under maneuvering movements. Moreover, we also propose a Two-phase association algorithm which is an effective way to recover lost tracks back from previous frames. CAMTrack was evaluated on the widely recognized object tracking benchmarks-KITTI and MOT17. The results showed the superior performance of the proposed method, highlighting its potential to contribute to advances in object tracking.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-024-01548-w/MediaObjects/138_2024_1548_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-024-01548-w/MediaObjects/138_2024_1548_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-024-01548-w/MediaObjects/138_2024_1548_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-024-01548-w/MediaObjects/138_2024_1548_Figa_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-024-01548-w/MediaObjects/138_2024_1548_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-024-01548-w/MediaObjects/138_2024_1548_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-024-01548-w/MediaObjects/138_2024_1548_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-024-01548-w/MediaObjects/138_2024_1548_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-024-01548-w/MediaObjects/138_2024_1548_Fig8_HTML.png)
Similar content being viewed by others
References
Bolme, D.S., Beveridge, J.R., Draper, B.A., Lui, Y.M.: Visual object tracking using adaptive correlation filters. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2544–2550, IEEE (2010)
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2014)
Li, Y., Huang, J., Li, Y., Wang, S., Yang, M.: Proceeding IEEE Conference on Computer Vision and Pattern Recognition (2016)
Butt, A.A., Collins, R.T.: Multi-target tracking by lagrangian relaxation to min-cost network flow. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Berclaz, J., Fleuret, F., Turetken, E., Fua, P.: Multiple object tracking using k-shortest paths optimization. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1806–1819 (2011). https://doi.org/10.1109/TPAMI.2011.21
Tang, S., Andres, B., Andriluka, M., Schiele, B.: Subgraph decomposition for multi-target tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5033–5041 (2015)
Dehghan, A., Modiri Assari, S., Shah, M.: Gmmcp tracker: globally optimal generalized maximum multi clique problem for multiple object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4091–4099 (2015)
Wu, M., Peng, X.: Motion constraint markov network model for multi-target tracking. In: 2008 International Conference on Audio, Language and Image Processing, pp. 981–987, IEEE (2008)
Milan, A., Schindler, K., Roth, S.: Detection-and trajectory-level exclusion in multiple object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3682–3689 (2013)
Milan, A., Roth, S., Schindler, K.: Continuous energy minimization for multitarget tracking. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 58–72 (2013)
Hoang, H.A., Yoo, M.: 3onet: 3-d detector for occluded object under obstructed conditions. IEEE Sens. J. 23(16), 18879–18892 (2023). https://doi.org/10.1109/JSEN.2023.3293515
Cao, J., Weng, X., Khirodkar, R., Pang, J., Kitani, K.: Observation-centric sort: rethinking sort for robust multi-object tracking. ar**v preprint ar**v:2203.14360 (2022)
Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468, IEEE (2016)
Chaabane, M., Zhang, P., Beveridge, J.R., O’Hara, S.: Deft: detection embeddings for tracking. ar**v preprint ar**v:2102.02267 (2021)
Li, X.R., Jilkov, V.P.: Survey of maneuvering target tracking . Part v. multiple-model methods. IEEE Trans. Aerosp. Electron. Syst. 41(4), 1255–1321 (2005)
Zhao, D., Fu, H., **ao, L., Wu, T., Dai, B.: Multi-object tracking with correlation filter for autonomous vehicle. Sensors 18(7), 2004 (2018)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37, Springer (2016)
Gündüz, G., Acarman, T.: A lightweight online multiple object vehicle tracking method. In: 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 427–432, IEEE (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28. Curran Associates, Inc., (2015)
Zhang, W., Zhou, H., Sun, S., Wang, Z., Shi, J., Loy, C.C.: Robust multi-modality multi-object tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2365–2374 (2019)
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: Pointpillars: fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019)
Gonzalez, N.F., Ospina, A., Calvez, P.: Smat: Smart multiple affinity metrics for multiple object tracking. In: Image Analysis and Recognition: 17th International Conference, ICIAR 2020, Póvoa de Varzim, Portugal, June 24–26, 2020, Proceedings, Part II 17, pp. 48–62, Springer (2020)
Hu, H.-N., Cai, Q.-Z., Wang, D., Lin, J., Sun, M., Krahenbuhl, P., Darrell, T., Yu, F.: Joint monocular 3d vehicle detection and tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5390–5399 (2019)
Marinello, N., Proesmans, M., Van Gool, L.: Triplettrack: 3d object tracking using triplet embeddings and lstm. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4500–4510 (2022)
Wang, S., Cai, P., Wang, L., Liu, M.: Ditnet: end-to-end 3d object detection and track id assignment in spatio-temporal world. IEEE Robot. Autom. Lett. 6(2), 3397–3404 (2021)
Wang, X., Fu, C., Li, Z., Lai, Y., He, J.: Deepfusionmot: a 3d multi-object tracking framework based on camera-lidar fusion with deep association. IEEE Robot. Autom. Lett. 7(3), 8260–8267 (2022)
Ren, J., Chen, X., Liu, J., Sun, W., Pang, J., Yan, Q., Tai, Y.-W., Xu, L.: Accurate single stage detector using recurrent rolling convolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5420–5428 (2017)
Wang, X., Fu, C., He, J., Wang, S., Wang, J.: Strongfusionmot: a multi-object tracking method based on lidar-camera fusion. IEEE Sens. J. (2022). https://doi.org/10.1109/JSEN.2022.3226490
Hu, H.-N., Yang, Y.-H., Fischer, T., Darrell, T., Yu, F., Sun, M.: Monocular quasi-dense 3d object tracking. IEEE Trans. Pattern Anal. Mach. Intell. 45(2), 1992–2008 (2022)
Rangesh, A., Maheshwari, P., Gebre, M., Mhatre, S., Ramezani, V., Trivedi, M.M.: Trackmpnn: A message passing graph neural architecture for multi-object tracking. ar**v preprint ar**v:2101.04206 (2021)
Wang, G., Gu, R., Liu, Z., Hu, W., Song, M., Hwang, J.-N.: Track without appearance: learn box and tracklet embedding with local and global motion patterns for vehicle tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9876–9886 (2021)
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. ar**v preprint ar**v:1904.07850 (2019)
Kim, A., Ošep, A., Leal-Taixé, L.: Eagermot: 3d multi-object tracking via sensor fusion. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 11315–11321, IEEE (2021)
Reich, A., Wuensche, H.-J.: Monocular 3d multi-object tracking with an ekf approach for long-term stable tracks. In: 2021 IEEE 24th International Conference on Information Fusion (FUSION), pp. 1–7, IEEE (2021)
Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. In: European Conference on Computer Vision, pp. 474–490, Springer (2020)
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: exceeding yolo series in 2021. ar**v preprint ar**v:2107.08430 (2021)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Duong, M.-T., Lee, S., Hong, M.-C.: Dmt-net: deep multiple networks for low-light image enhancement based on retinex model. IEEE Access 11, 132147–132161 (2023)
Yazdian-Dehkordi, M., Azimifar, Z.: Adaptive visual target detection and tracking using incremental appearance learning. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 1041–1045, IEEE (2015)
Karunasekera, H., Wang, H., Zhang, H.: Multiple object tracking with attention to appearance, structure, motion and size. IEEE Access 7, 104423–104434 (2019)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. ar**v preprint ar**v:1409.1556 (2014)
Chu, P., Ling, H.: Famnet: joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6172–6181 (2019)
Sun, S., Akhtar, N., Song, H., Mian, A., Shah, M.: Deep affinity network for multiple object tracking. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 104–119 (2019)
Sadeghian, A., Alahi, A., Savarese, S.: Tracking the untrackable: learning to track multiple cues with long-term dependencies. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 300–311 (2017)
Zhou, Q., Zhong, B., Zhang, Y., Li, J., Fu, Y.: Deep alignment network based multi-person tracking with occlusion and motion reasoning. IEEE Trans. Multimed. 21(5), 1183–1194 (2018)
Yoon, K., Kim, D.Y., Yoon, Y.-C., Jeon, M.: Data association for multi-object tracking via deep neural networks. Sensors 19(3), 559 (2019)
Xu, B., Liang, D., Li, L., Quan, R., Zhang, M.: An effectively finite-tailed updating for multiple object tracking in crowd scenes. Appl. Sci. 12(3), 1061 (2022)
Mahmoudi, N., Ahadi, S.M., Rahmati, M.: Multi-target tracking using cnn-based features: Cnnmtt. Multimed. Tools Appl. 78(6), 7077–7096 (2019)
Lan, L., Wang, X., Zhang, S., Tao, D., Gao, W., Huang, T.S.: Interacting tracklets for multi-object tracking. IEEE Trans. Image Process. 27(9), 4585–4597 (2018)
Held, D., Thrun, S., Savarese, S.: Learning to track at 100 fps with deep regression networks. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 749–765, Springer (2016)
**ao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 466–481 (2018)
**ang, J., Zhang, G., Hou, J.: Online multi-object tracking based on feature representation and Bayesian filtering within a deep learning architecture. IEEE Access 7, 27923–27935 (2019)
Wu, H., Wen, C., Shi, S., Li, X., Wang, C.: Virtual sparse convolution for multimodal 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21653–21662 (2023)
Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logist. Q. 2(1–2), 83–97 (1955)
Zhang, Y., Sun, P., Jiang, Y., Yu, D., Yuan, Z., Luo, P., Liu, W., Wang, X.: Bytetrack: multi-object tracking by associating every detection box. ar**v preprint ar**v:2110.06864 (2021)
Chu, P., Fan, H., Tan, C.C., Ling, H.: Online multi-object tracking with instance-aware tracker and dynamic model refreshment. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 161–170, IEEE (2019)
Yu, F., Wang, D., Shelhamer, E., Darrell, T.: Deep layer aggregation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2403–2412 (2018)
Chaabane, M., Gueguen, L., Trabelsi, A., Beveridge, R., O’Hara, S.: End-to-end learning improves static object geo-localization in monocular video. ar**v preprint ar**v:2004.05232 (2020)
Luiten, J., Osep, A., Dendorfer, P., Torr, P.H.S., Geiger, A., Leal-Taixé, L., Leibe, B.: HOTA: A higher order metric for evaluating multi-object tracking. CoRR abs/2009.07736 (2020) 2009.07736
Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: the clear mot metrics. EURASIP J. Image Video Process. (2008). https://doi.org/10.1155/2008/246309
Ristani, E., Solera, F., Zou, R.S., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. CoRR abs/1609.01775 (2016) 1609.01775
Jiang, C., Wang, Z., Liang, H., Tan, S.: A fast and high-performance object proposal method for vision sensors: application to object detection. IEEE Sens. J. 22(10), 9543–9557 (2022)
Choi, W.: Near-online multi-target tracking with aggregated local flow descriptor. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3029–3037 (2015)
Bergmann, P., Meinhardt, T., Leal-Taixe, L.: Tracking without bells and whistles. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 941–951 (2019)
Liu, Q., Chu, Q., Liu, B., Yu, N.: Gsm: graph similarity model for multi-object tracking. In: IJCAI, pp. 530–536 (2020)
Brasó, G., Leal-Taixé, L.: Learning a neural solver for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6247–6257 (2020)
Acknowledgements
This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Government of South Korea (MSIT)(NRF-2021R1A2B5B01002559).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bui, D.C., Nguyen, N.L., Hoang, A.H. et al. CAMTrack: a combined appearance-motion method for multiple-object tracking. Machine Vision and Applications 35, 62 (2024). https://doi.org/10.1007/s00138-024-01548-w
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-024-01548-w