Log in

An efficient weakly semi-supervised method for object automated annotation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Object annotation is essential for computer vision tasks, and more high-quality annotated data can effectively improve the performance of vision models. However, manual annotation is time-consuming (annotating a box takes 35s). Recent studies have explored faster automated annotation, among which weakly supervised methods stand out. Weakly supervised methods learn to automatically localize objects in images from weakly labeled annotations, e.g., class tags or points, replacing manual bounding box annotations. Although using a single weakly labeled annotation can reduce a large amount of time, it leads to poor annotation quality, particularly for the complex scenes containing multiple objects. To balance annotation time and quality, we propose a weakly semi-supervised automated annotation method. Its main idea is to incorporate point-labeled and fully labeled annotations into a teacher-student framework for training, to jointly localize the object bounding boxes on all point-labeled images. We also propose two effective techniques within this framework to better use of these mixed annotations. The first is a point-guided sample assignment technique which optimizes the loss calculation. The second is a pseudo-label filtering technique which generate accurate pseudo labels for model training by utilizing the points and boxes localization confidences. Extensive experiments on MSCOCO demonstrate that our method outperforms existing automated annotation methods. In particular, when using 95% point-labeled and 5% fully labeled data, our approach reduces the annotation time by approximately 52% and achieves an annotation quality of 87.4% mIoU.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability Statement

The datasets generated and analysed during the current study are not publicly available due to the excessive size of MSCOCO but are available from the corresponding author on reasonable request.

References

  1. Adhikari B, Peltomaki J, Puura J, Huttunen H (2018) Faster bounding box annotation for object detection in indoor scenes. In: 2018 7th European Workshop on Visual Information Processing (EUVIP), pp 1–6

  2. Adhikari B, Huttunen H (2021) Iterative bounding box annotation for object detection. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp 4040–4046

  3. Adhikari B, Rahtu E, Huttunen H (2021) Sample selection for efficient image annotation. In: 2021 9th European Workshop on Visual Information Processing (EUVIP), pp 1–6

  4. Akhilesh K, Sedamkar RR (2016) Automatic image annotation using an ant colony optimization algorithm (aco). In: 2016 IEEE 7th Power India International Conference (PIICON), pp 1–4

  5. Anjum S, Verma A, Dang B, Gurari D (2021) Exploring the use of deep learning with crowdsourcing to annotate images. Human Comput 8 (2):76–106

    Article  Google Scholar 

  6. Bacanin N, Stoean R, Zivkovic M, Petrovic A, Rashid T A, Bezdan T (2021) Performance of a novel chaotic firefly algorithm with enhanced exploration for tackling global optimization problems: application for dropout regularization. Mathematics 9(21):2705

    Article  Google Scholar 

  7. Bacanin N, Budimirovic N, Strumberger I, Alrasheedi A F, Abouhawwash M (2022) Novel chaotic oppositional fruit fly optimization algorithm for feature selection applied on covid 19 patients’ health prediction. Plos one 17(10):e0275727

    Article  Google Scholar 

  8. Bakkouri I, Afdel K (2020) Computer-aided diagnosis (cad) system based on multi-layer feature fusion network for skin lesion recognition in dermoscopy images. Multimed Tools Applic 79(29):20,483–20,518

    Article  Google Scholar 

  9. Bakkouri I, Afdel K (2022) Mlca2f: multi-level context attentional feature fusion for covid-19 lesion segmentation from ct scans. Signal, Image and Video Processing, 1–8

  10. Bearman A, Russakovsky O, Ferrari V et al (2016) What’s the point: semantic segmentation with point supervision. In: European conference on computer vision, pp 549–565

  11. Bernal J, Histace A, Masana M et al (2019) Gtcreator: a flexible annotation tool for image-based datasets. Int J Comput Assist Radiol Surg 14(2):191–201

    Article  Google Scholar 

  12. Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162

  13. Chandra A L, Desai S V, Balasubramanian V N et al (2020) Active learning with point supervision for cost-effective panicle detection in cereal crops. Plant Methods 16(1):1–16

    Article  Google Scholar 

  14. Chen K, Wang J, Pang J et al (2019) Mmdetection: open mmlab detection toolbox and benchmark. ar**v:1906.07155

  15. Chen L, Yang T, Zhang X et al (2021) Points as queries: weakly semi-supervised object detection by points. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8823–8832

  16. Cinbis R G, Verbeek J, Schmid C (2016) Weakly supervised object localization with multi-fold multiple instance learning. IEEE Trans Pattern Anal Mach Intell 39(1):189–203

    Article  Google Scholar 

  17. De Boer MHT, Bouma H, Kruithof M et al (2019) Rapid annotation tool to train novel concept detectors with active learning. In: MMEDIA 2019: international conference on advances in multimedia, pp 36–41

  18. Gao W, Wan F, Yue J et al (2022) Discrepant multiple instance learning for weakly supervised object detection. Pattern Recogn 122:108233

    Article  Google Scholar 

  19. Groenen I, Rudinac S, Worring M (2022) Panorams: automatic annotation for detecting objects in urban context. ar**v:2208.14295

  20. Gygli M, Ferrari V (2020) Efficient object annotation via speaking and pointing. Int J Comput Vision 128(5):1061–1075

    Article  Google Scholar 

  21. Han J, Xu M, Li X et al (2014) Interactive object-based image retrieval and annotation on ipad. Multimed Tools Applic 72(3):2275–2297

    Article  Google Scholar 

  22. He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  23. Huang Z, Zou Y et al (2020) Comprehensive attention self-distillation for weakly-supervised object detection. Adv Neur Inform Process Syst 33:16797–16807

    Google Scholar 

  24. Ince K G, Koksal A, Fazla A et al (2021) Semi-automatic annotation for visual object tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1233–1239

  25. Jeong J, Lee S, Kim J et al (2019) Consistency-based semi-supervised learning for object detection. Adv Neur Inform Process Syst 32:3–6

    Google Scholar 

  26. Jiang B, Luo R, Mao J et al (2018) Acquisition of localization confidence for accurate object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 784–799

  27. Kiyokawa T, Tomochika K, Takamatsu J et al (2019) Efficient collection and automatic annotation of real-world object images by taking advantage of post-diminished multiple visual markers. Adv Robot 33(24):1264–1280

    Article  Google Scholar 

  28. Kiyokawa T, Tomochika K, Takamatsu J et al (2019) Fully automated annotation with noise-masked visual markers for deep-learning-based object detection. IEEE Robot Autom Lett 4(2):1972–1977

    Article  Google Scholar 

  29. Konyushkova K, Uijlings J, Lampert C H et al (2018) Learning intelligent dialogs for bounding box annotation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9175–9184

  30. Li X, Yi S, Zhang R et al (2022) Dynamic sample weighting for weakly supervised object detection. Image Vis Comput 122:104444

    Article  Google Scholar 

  31. Lin T-Y, Maire M, Belongie S et al (2014) Microsoft coco: common objects in context. In: European conference on computer vision, pp 740–755

  32. Lin D, Dai J, Jia J et al (2016) Scribblesup: scribble-supervised convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3159–3167

  33. Lin T-Y, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  34. Lin T-Y, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988

  35. Malakar S, Ghosh M, Bhowmik S et al (2020) A ga based hierarchical feature selection approach for handwritten word recognition. Neural Comput Appl 32(7):2533–2552

    Article  Google Scholar 

  36. Papadopoulos D P, Clarke Alasdair DF, Keller F et al (2014) Training object class detectors from eye tracking data. In: European conference on computer vision, pp 361–376

  37. Papadopoulos D P, Uijlings JRR, Keller F et al (2016) We don’t need no bounding-boxes: training object class detectors using only human verification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 854–863

  38. Papadopoulos D P, Uijlings JRR, Keller F et al (2017) Extreme clicking for efficient object annotation. In: Proceedings of the IEEE international conference on computer vision, pp 4930–4939

  39. Papadopoulos D P, Uijlings JRR, Keller F et al (2017) Training object class detectors with click supervision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6374–6383

  40. Ren S, He K, Girshick R et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Advances in neural information processing systems 28:91–99

    Google Scholar 

  41. Ren Z, Yu Z, Yang X et al (2020) Ufo2: a unified framework towards omni-supervised object detection. In: European conference on computer vision, pp 288–313

  42. Rochan M, Rahman S, Bruce ND et al (2016) Weakly supervised object localization and segmentation in videos. Image Vis Comput 56:1–12

    Article  Google Scholar 

  43. Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  44. Russakovsky O, Li L-J, Fei-Fei L (2015) Best of both worlds: human-machine collaboration for object annotation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2121–2131

  45. Russell B C, Torralba A, Murphy K P et al (2008) Labelme: a database and web-based tool for image annotation. Int J Comput Vis 77(1):157–173

    Article  Google Scholar 

  46. Sohn K, Berthelot D, Carlini N et al (2020) Fixmatch: simplifying semi-supervised learning with consistency and confidence. Adv Neur Inform Process Syst 33:596–608

    Google Scholar 

  47. Sohn K, Zhang Z, Li C-L et al (2020) A simple semi-supervised learning framework for object detection. ar**v:2005.04757

  48. Su H, Deng J, Fei-Fei L (2012) Crowdsourcing annotations for visual object detection. In: Workshops at the twenty-sixth AAAI conference on artificial intelligence, pp 4–5

  49. Tang P, Wang X, Bai S et al (2018) Pcl: proposal cluster learning for weakly supervised object detection. IEEE Trans Pattern Anal Mach Intell 42(1):176–191

    Article  Google Scholar 

  50. Tarvainen A, Valpola H (2017) Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. Adv Neur Inform Process Syst 30:1195–1204

    Google Scholar 

  51. Tian Z, Shen C, Chen H et al (2019) Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9627–9636

  52. Uijlings JRR, Andriluka M, Ferrari V (2020) Panoptic image annotation with a collaborative assistant. In: Proceedings of the 28th ACM international conference on multimedia, pp 3302–3310

  53. Ries C X, Richter F, Lienhart R (2016) Towards automatic bounding box annotations from weakly labeled images. Multimed Tools Applic 75 (11):6091–6118

    Article  Google Scholar 

  54. Wang C, Huang K, Ren W et al (2015) Large-scale weakly supervised object localization via latent category learning. IEEE Trans Image Process 24 (4):1371–1385

    Article  MathSciNet  Google Scholar 

  55. Wang X, **ang X, Zhang B et al (2022) Weakly supervised object detection based on active learning. Neural Process Lett 54(6):5169–5183

    Article  Google Scholar 

  56. Wu S, Li X, Wang X (2020) Iou-aware single-stage object detector for accurate localization. Image Vis Comput 97:103,911

    Article  Google Scholar 

  57. Xu J, Schwing A G, Urtasun R (2015) Learning to segment under various forms of weak supervision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3781–3790

  58. Xu M, Zhang Z, Hu H et al (2021) End-to-end semi-supervised object detection with soft teacher. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3060–3069

  59. Zhang H, Wang Y, Dayoub F et al (2021) Varifocalnet: an iou-aware dense object detector. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8514–8523

  60. Zhang Y-F, Ren W, Zhang Z et al (2022) Focal and efficient iou loss for accurate bounding box regression. Neurocomputing 506:146–157

    Article  Google Scholar 

  61. Zhou Q, Yu C, Wang Z et al (2021) Instant-teaching: an end-to-end semi-supervised object detection framework. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4081–4090

  62. Zhou L, Chang H, Ma B et al (2022) Interactive regression and classification for dense object detector. IEEE Trans Image Process 31:3684–3696

    Article  Google Scholar 

  63. Zitnick C L, Dollár P (2014) Edge boxes: locating object proposals from edges. In: European conference on computer vision, pp 391–405

  64. Zoph B, Ghiasi G, Lin T-Y et al (2020) Rethinking pre-training and self-training. Adv Neur Inform Process Syst 33:3833–3845

    Google Scholar 

Download references

Acknowledgments

This work was supported by the NSFC fund (62171288), Shenzhen Fundamental Research fund under Grant 20200810150441003 and JCYJ20190808143415801, and the Guangdong Basic and Applied Basic Research Foundation under Grant 2020A1515011559 and 2021A1515012287.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to **ngzheng Wang.

Ethics declarations

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Wei, G., Chen, S. et al. An efficient weakly semi-supervised method for object automated annotation. Multimed Tools Appl 83, 9417–9440 (2024). https://doi.org/10.1007/s11042-023-15305-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15305-0

Keywords

Navigation