Abstract
Video object segmentation (VOS) has been widely used in the fields of computer vision. However, existing VOS algorithms have drawbacks, such as difficulty with object deformation, occlusion, and fast motion. We therefore propose an effective VOS algorithm based on semantic visual words matching. Specifically, given the support frame and its corresponding mask, the frame is firstly input to the encoder with an embedding layer, and then a clustering algorithm is followed to generate a group of semantic visual words according to its mask. For a query frame to be segmented, a matching operation is performed against words generated from the support frame. In this manner, each pixel on query frame can be classified into different object categories by the obtained similarity. What’s more, a self-attention mechanism is applied to enhance the embedding features in order to capture the global dependencies before the words matching. For further handling the object changing and global mismatch problems, an online update and correction mechanism are also employed in our method. Experiments show that our proposed method achieved competitive results on the DAVIS 2016 and DAVIS 2017 datasets. J&F-mean, the mean value between regional similarity and contour accuracy, reached 83.2% and 72.3% on DAVIS 2016 and DAVIS 2017, respectively.
Similar content being viewed by others
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
References
Behl HS, Najafi M, Arnab A, Torr PHS (2019) Meta learning deep visual words for fast video object segmentation. In: Proceedings of the 2019 conference on neural information processing systems machine learning for autonomous driving workshop
Caelles S, Maninis KK, Pont-Tuset J, Leal-Taixe L, Cremers D, Gool LV (2017) One-shot video object segmentation. In: Proceedings of the 2017 IEEE conference on computer vision and pattern recognition, pp 5320–5329
Hospedales T, Antoniou A, Micaelli P, Storkey A (2020) Meta-learning in neural networks: a survey. In: Arxiv preprint ar**v:2004.05439
Hu YT, Huang JB, Schwing AG (2018) Videomatch: Matching based video object segmentation. In: Proceedings of the 2018 European conference on computer vision
Khoreva A, Benenson R, Ilg E, Brox T, Schiele B (2019) Lucid data dreaming for video object segmentation. International Journal of Computer Vision
Li Y, Shen Z, Shan Y (2020) Fast video object segmentation using the global context module. In: Vedaldi A, Bischof H, Brox T, Frahm JM (eds) Computer vision – ECCV 2020. Springer International Publishing, Cham, pp 735–750
Liang Y, Li X, Jafari N, Chen Q (2020) Video object segmentation with adaptive feature bank and uncertain-region refinement. In: Proceedings of the 2020 conference on neural information processing systems
Lu X, Wang W, Danelljan M, Zhou T, Shen J, Van Gool L (2020) Video object segmentation with episodic graph memory networks. In: Vedaldi A, Bischof H, Brox T, Frahm JM (eds) Computer vision – ECCV 2020. Springer International Publishing, Cham, pp 661–679
Lu X, Wang W, Ma C, Shen J, Shao L, Porikli F (2019) See more, know more: Unsupervised video object segmentation with co-attention siamese networks. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3618–3627. https://doi.org/10.1109/CVPR.2019.00374
Lu X, Wang W, Shen J, Crandall D, Luo J (2022) Zero-shot video object segmentation with co-attention siamese networks. IEEE Trans Pattern Anal Mach Intell 44 (4):2228–2242. https://doi.org/10.1109/TPAMI.2020.3040258
Lu X, Wang W, Shen J, Crandall D, Van Gool L (2021) Segmenting objects from relational visual data. IEEE Trans Pattern Anal Mach Intell, pp 1–1. https://doi.org/10.1109/TPAMI.2021.3115815
Luiten J, Voigtlaender P, Leibe B (2018) Premvos:proposal-generation, refinement and merging for the davis challenge on video object segmentation 2018. In: The 2018 DAVIS challenge on video object segmentation - CVPR workshops
Maninis K, Caelles S, Chen Y, Pont-Tuset J, Leal-Taixé L, Cremers D, Van Gool L (2019) Video object segmentation without temporal information. IEEE Trans Pattern Anal Mach Intell 41(6):1515–1530
Meinhardt T, Leal-taixe L (2020) Make one-shot video object segmentation efficient again. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Advances in neural information processing systems, vol 33. Curran Associates, Inc., pp 10607–10619. https://proceedings.neurips.cc/paper/2020/file/781397bc0630d47ab531ea850bddcf63-Paper.pdf
Oh SW, Lee J, Sunkavalli K, Kim SJ (2018) Fast video object segmentation by reference-guided mask propagation. In: Proceedings of the 2018 IEEE conference on computer vision and pattern recognition, pp 7376–7385
Oh SW, Lee J, Xu N, Kim SJ (2019) Video object segmentation using space-time memory networks. In: Proceedings of the 2019 IEEE international conference on computer vision, pp 9225–9234
Perazzi F, Khoreva A, Benenson R, Schiele B, Sorkine-Hornung A (2017) Learning video object segmentation from static images. In: Proceedings of the 2017 IEEE conference on computer vision and pattern recognition
Seong H, Hyun J, Kim E (2020) Kernelized memory network for video object segmentation. In: Vedaldi A, Bischof H, Brox T, Frahm JM (eds) Computer vision – ECCV 2020. Springer International Publishing, Cham, pp 629–645
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser U, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems, NIPS’17. Curran Associates Inc., Red Hook, NY, USA, pp 6000–6010
Voigtlaender P, Chai Y, Schroff F, Adam H, Leibe B, Chen L (2019) Feelvos: Fast end-to-end embedding learning for video object segmentation. In: Proceedings of the 2019 IEEE conference on computer vision and pattern recognition, pp 9473–9482
Wang Z, Xu J, Liu L, Zhu F, Shao L (2019) Ranet: Ranking attention network for fast video object segmentation. In: 2019 IEEE/CVF international conference on computer vision (ICCV). https://doi.org/10.1109/ICCV.2019.00408, pp 3977–3986
Woo S, Park J, Lee J, Kweon IS (2018) CBAM: convolutional block attention module. In: Computer vision – ECCV 2018, Lecture notes in computer science. https://doi.org/10.1007/978-3-030-01234-2_1, vol 11211. Springer, pp 3–19
**e H, Yao H, Zhou S, Zhang S, Sun W (2021) Efficient regional memory network for video object segmentation. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR46437.2021.00134, pp 1286–1295
Yang L, Wang Y, **ong X, Yang J, Katsaggelos AK (2018) Efficient video object segmentation via network modulation. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp 6499–6507
Yang Z, Wei Y, Yang Y (2020) Collaborative video object segmentation by foreground-background integration. In: Proceedings of the 2020 European conference on computer vision
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was partially supported by the National Natural Science Foundation of China (Grant Nos. 61802197) and is also funded in part by the Science and Technology Development Fund, Macau SAR (File Nos. SKL-IOTSC-2018-2020, 0018/2019/AKP, 00 08/2019/AGJ, and FDCT/194/2017/A3), in part by the University of Macau under Grant MYRG2018-00248-FST and MYRG2019-0137-FST.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hao, C., Chen, Y., Wu, W. et al. Video object segmentation through semantic visual words matching. Multimed Tools Appl 82, 19591–19605 (2023). https://doi.org/10.1007/s11042-023-14361-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14361-w