Abstract
Object annotation is essential for computer vision tasks, and more high-quality annotated data can effectively improve the performance of vision models. However, manual annotation is time-consuming (annotating a box takes 35s). Recent studies have explored faster automated annotation, among which weakly supervised methods stand out. Weakly supervised methods learn to automatically localize objects in images from weakly labeled annotations, e.g., class tags or points, replacing manual bounding box annotations. Although using a single weakly labeled annotation can reduce a large amount of time, it leads to poor annotation quality, particularly for the complex scenes containing multiple objects. To balance annotation time and quality, we propose a weakly semi-supervised automated annotation method. Its main idea is to incorporate point-labeled and fully labeled annotations into a teacher-student framework for training, to jointly localize the object bounding boxes on all point-labeled images. We also propose two effective techniques within this framework to better use of these mixed annotations. The first is a point-guided sample assignment technique which optimizes the loss calculation. The second is a pseudo-label filtering technique which generate accurate pseudo labels for model training by utilizing the points and boxes localization confidences. Extensive experiments on MSCOCO demonstrate that our method outperforms existing automated annotation methods. In particular, when using 95% point-labeled and 5% fully labeled data, our approach reduces the annotation time by approximately 52% and achieves an annotation quality of 87.4% mIoU.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-15305-0/MediaObjects/11042_2023_15305_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-15305-0/MediaObjects/11042_2023_15305_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-15305-0/MediaObjects/11042_2023_15305_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-15305-0/MediaObjects/11042_2023_15305_Fige_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-15305-0/MediaObjects/11042_2023_15305_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-15305-0/MediaObjects/11042_2023_15305_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-15305-0/MediaObjects/11042_2023_15305_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-15305-0/MediaObjects/11042_2023_15305_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-15305-0/MediaObjects/11042_2023_15305_Fig8_HTML.png)
Similar content being viewed by others
Data Availability Statement
The datasets generated and analysed during the current study are not publicly available due to the excessive size of MSCOCO but are available from the corresponding author on reasonable request.
References
Adhikari B, Peltomaki J, Puura J, Huttunen H (2018) Faster bounding box annotation for object detection in indoor scenes. In: 2018 7th European Workshop on Visual Information Processing (EUVIP), pp 1–6
Adhikari B, Huttunen H (2021) Iterative bounding box annotation for object detection. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp 4040–4046
Adhikari B, Rahtu E, Huttunen H (2021) Sample selection for efficient image annotation. In: 2021 9th European Workshop on Visual Information Processing (EUVIP), pp 1–6
Akhilesh K, Sedamkar RR (2016) Automatic image annotation using an ant colony optimization algorithm (aco). In: 2016 IEEE 7th Power India International Conference (PIICON), pp 1–4
Anjum S, Verma A, Dang B, Gurari D (2021) Exploring the use of deep learning with crowdsourcing to annotate images. Human Comput 8 (2):76–106
Bacanin N, Stoean R, Zivkovic M, Petrovic A, Rashid T A, Bezdan T (2021) Performance of a novel chaotic firefly algorithm with enhanced exploration for tackling global optimization problems: application for dropout regularization. Mathematics 9(21):2705
Bacanin N, Budimirovic N, Strumberger I, Alrasheedi A F, Abouhawwash M (2022) Novel chaotic oppositional fruit fly optimization algorithm for feature selection applied on covid 19 patients’ health prediction. Plos one 17(10):e0275727
Bakkouri I, Afdel K (2020) Computer-aided diagnosis (cad) system based on multi-layer feature fusion network for skin lesion recognition in dermoscopy images. Multimed Tools Applic 79(29):20,483–20,518
Bakkouri I, Afdel K (2022) Mlca2f: multi-level context attentional feature fusion for covid-19 lesion segmentation from ct scans. Signal, Image and Video Processing, 1–8
Bearman A, Russakovsky O, Ferrari V et al (2016) What’s the point: semantic segmentation with point supervision. In: European conference on computer vision, pp 549–565
Bernal J, Histace A, Masana M et al (2019) Gtcreator: a flexible annotation tool for image-based datasets. Int J Comput Assist Radiol Surg 14(2):191–201
Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162
Chandra A L, Desai S V, Balasubramanian V N et al (2020) Active learning with point supervision for cost-effective panicle detection in cereal crops. Plant Methods 16(1):1–16
Chen K, Wang J, Pang J et al (2019) Mmdetection: open mmlab detection toolbox and benchmark. ar**v:1906.07155
Chen L, Yang T, Zhang X et al (2021) Points as queries: weakly semi-supervised object detection by points. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8823–8832
Cinbis R G, Verbeek J, Schmid C (2016) Weakly supervised object localization with multi-fold multiple instance learning. IEEE Trans Pattern Anal Mach Intell 39(1):189–203
De Boer MHT, Bouma H, Kruithof M et al (2019) Rapid annotation tool to train novel concept detectors with active learning. In: MMEDIA 2019: international conference on advances in multimedia, pp 36–41
Gao W, Wan F, Yue J et al (2022) Discrepant multiple instance learning for weakly supervised object detection. Pattern Recogn 122:108233
Groenen I, Rudinac S, Worring M (2022) Panorams: automatic annotation for detecting objects in urban context. ar**v:2208.14295
Gygli M, Ferrari V (2020) Efficient object annotation via speaking and pointing. Int J Comput Vision 128(5):1061–1075
Han J, Xu M, Li X et al (2014) Interactive object-based image retrieval and annotation on ipad. Multimed Tools Applic 72(3):2275–2297
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Huang Z, Zou Y et al (2020) Comprehensive attention self-distillation for weakly-supervised object detection. Adv Neur Inform Process Syst 33:16797–16807
Ince K G, Koksal A, Fazla A et al (2021) Semi-automatic annotation for visual object tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1233–1239
Jeong J, Lee S, Kim J et al (2019) Consistency-based semi-supervised learning for object detection. Adv Neur Inform Process Syst 32:3–6
Jiang B, Luo R, Mao J et al (2018) Acquisition of localization confidence for accurate object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 784–799
Kiyokawa T, Tomochika K, Takamatsu J et al (2019) Efficient collection and automatic annotation of real-world object images by taking advantage of post-diminished multiple visual markers. Adv Robot 33(24):1264–1280
Kiyokawa T, Tomochika K, Takamatsu J et al (2019) Fully automated annotation with noise-masked visual markers for deep-learning-based object detection. IEEE Robot Autom Lett 4(2):1972–1977
Konyushkova K, Uijlings J, Lampert C H et al (2018) Learning intelligent dialogs for bounding box annotation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9175–9184
Li X, Yi S, Zhang R et al (2022) Dynamic sample weighting for weakly supervised object detection. Image Vis Comput 122:104444
Lin T-Y, Maire M, Belongie S et al (2014) Microsoft coco: common objects in context. In: European conference on computer vision, pp 740–755
Lin D, Dai J, Jia J et al (2016) Scribblesup: scribble-supervised convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3159–3167
Lin T-Y, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Lin T-Y, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Malakar S, Ghosh M, Bhowmik S et al (2020) A ga based hierarchical feature selection approach for handwritten word recognition. Neural Comput Appl 32(7):2533–2552
Papadopoulos D P, Clarke Alasdair DF, Keller F et al (2014) Training object class detectors from eye tracking data. In: European conference on computer vision, pp 361–376
Papadopoulos D P, Uijlings JRR, Keller F et al (2016) We don’t need no bounding-boxes: training object class detectors using only human verification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 854–863
Papadopoulos D P, Uijlings JRR, Keller F et al (2017) Extreme clicking for efficient object annotation. In: Proceedings of the IEEE international conference on computer vision, pp 4930–4939
Papadopoulos D P, Uijlings JRR, Keller F et al (2017) Training object class detectors with click supervision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6374–6383
Ren S, He K, Girshick R et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Advances in neural information processing systems 28:91–99
Ren Z, Yu Z, Yang X et al (2020) Ufo2: a unified framework towards omni-supervised object detection. In: European conference on computer vision, pp 288–313
Rochan M, Rahman S, Bruce ND et al (2016) Weakly supervised object localization and segmentation in videos. Image Vis Comput 56:1–12
Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Russakovsky O, Li L-J, Fei-Fei L (2015) Best of both worlds: human-machine collaboration for object annotation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2121–2131
Russell B C, Torralba A, Murphy K P et al (2008) Labelme: a database and web-based tool for image annotation. Int J Comput Vis 77(1):157–173
Sohn K, Berthelot D, Carlini N et al (2020) Fixmatch: simplifying semi-supervised learning with consistency and confidence. Adv Neur Inform Process Syst 33:596–608
Sohn K, Zhang Z, Li C-L et al (2020) A simple semi-supervised learning framework for object detection. ar**v:2005.04757
Su H, Deng J, Fei-Fei L (2012) Crowdsourcing annotations for visual object detection. In: Workshops at the twenty-sixth AAAI conference on artificial intelligence, pp 4–5
Tang P, Wang X, Bai S et al (2018) Pcl: proposal cluster learning for weakly supervised object detection. IEEE Trans Pattern Anal Mach Intell 42(1):176–191
Tarvainen A, Valpola H (2017) Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. Adv Neur Inform Process Syst 30:1195–1204
Tian Z, Shen C, Chen H et al (2019) Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9627–9636
Uijlings JRR, Andriluka M, Ferrari V (2020) Panoptic image annotation with a collaborative assistant. In: Proceedings of the 28th ACM international conference on multimedia, pp 3302–3310
Ries C X, Richter F, Lienhart R (2016) Towards automatic bounding box annotations from weakly labeled images. Multimed Tools Applic 75 (11):6091–6118
Wang C, Huang K, Ren W et al (2015) Large-scale weakly supervised object localization via latent category learning. IEEE Trans Image Process 24 (4):1371–1385
Wang X, **ang X, Zhang B et al (2022) Weakly supervised object detection based on active learning. Neural Process Lett 54(6):5169–5183
Wu S, Li X, Wang X (2020) Iou-aware single-stage object detector for accurate localization. Image Vis Comput 97:103,911
Xu J, Schwing A G, Urtasun R (2015) Learning to segment under various forms of weak supervision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3781–3790
Xu M, Zhang Z, Hu H et al (2021) End-to-end semi-supervised object detection with soft teacher. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3060–3069
Zhang H, Wang Y, Dayoub F et al (2021) Varifocalnet: an iou-aware dense object detector. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8514–8523
Zhang Y-F, Ren W, Zhang Z et al (2022) Focal and efficient iou loss for accurate bounding box regression. Neurocomputing 506:146–157
Zhou Q, Yu C, Wang Z et al (2021) Instant-teaching: an end-to-end semi-supervised object detection framework. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4081–4090
Zhou L, Chang H, Ma B et al (2022) Interactive regression and classification for dense object detector. IEEE Trans Image Process 31:3684–3696
Zitnick C L, Dollár P (2014) Edge boxes: locating object proposals from edges. In: European conference on computer vision, pp 391–405
Zoph B, Ghiasi G, Lin T-Y et al (2020) Rethinking pre-training and self-training. Adv Neur Inform Process Syst 33:3833–3845
Acknowledgments
This work was supported by the NSFC fund (62171288), Shenzhen Fundamental Research fund under Grant 20200810150441003 and JCYJ20190808143415801, and the Guangdong Basic and Applied Basic Research Foundation under Grant 2020A1515011559 and 2021A1515012287.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, X., Wei, G., Chen, S. et al. An efficient weakly semi-supervised method for object automated annotation. Multimed Tools Appl 83, 9417–9440 (2024). https://doi.org/10.1007/s11042-023-15305-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15305-0