Log in

Multi-task feature-aligned head in one-stage object detection

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Existing one-stage detectors usually use two decoupled branches to optimize two subtasks, i.e., object localization and classification. However, this design paradigm will lead to misalignment of spatial features due to inconsistency in localization and classification. To mitigate this problem, we propose a plug-in and simple AF-Head (Aligned Features) that can generate aligned features for each task. Our proposed AF-Head contains Focus-Guided Feature Enhancement Module (FGM) and Auxiliary Positioning Module (APM). Specifically, in our FGM, we propose a focus branch representing the joint representation of localization confidence and classification scores. Then, we combine the focus and classification branches to alleviate the gap between training and inference. In addition, APM generates more accurate offsets for the localization branch to align with the classification branch. Moreover, we propose AF-Net based on the AF-Head. Extensive experiments on the MS-COCO demonstrate that our AF-Head can boost 0.7\(\sim \)1.7 AP on different state-of-the-art one-stage detectors. Notably, AF-Net with a standard ResNeXt-101-32x4d-DCN backbone achieves 49.2 AP on the COCO test-dev.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Availability of data and materials

The datasets generated or analyzed during this study are available in the MS-COCO2017 repository, https://cocodataset.org/download.

References

  1. Zou, Z., Shi, Z., Guo, Y., Ye, J.: Object detection in 20 years: a survey (2019). ar**v:1905.05055

  2. Chen, K., Lin, W., Li, J., See, J., Wang, J., Zou, J.: Ap-loss for accurate one-stage object detection. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 3782–3798 (2020)

    Article  Google Scholar 

  3. Chen, Z.M., **, X., Zhao, B.R., Zhang, X., Guo, Y.: HCE: hierarchical context embedding for region-based object detection. IEEE Trans. Image Process. 30, 6917–6929 (2021)

    Article  Google Scholar 

  4. Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2015)

    Article  Google Scholar 

  5. Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: more deformable, better results. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9300–9308 (2019)

  6. Fan, B., Shao, M., Li, Y., Li, C.: Global contextual attention for pure regression object detection. Int. J. Mach. Learn. Cybern. 13, 2189–2197 (2022)

    Article  Google Scholar 

  7. Li, Y., Shao, M., Fan, B., Zhang, W.: Multi-scale global context feature pyramid network for object detector. Signal Image Video Process 16, 705–713 (2022)

    Article  Google Scholar 

  8. He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 42, 386–397 (2020)

    Article  Google Scholar 

  9. Lin, T.Y., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42, 318–327 (2020)

    Article  Google Scholar 

  10. Kong, T., Sun, F., Liu, H., Jiang, Y., Li, L., Shi, J.: Foveabox: Beyound anchor-based object detection. IEEE Trans. Image Process. 29, 7389–7398 (2020)

    Article  MATH  Google Scholar 

  11. Yang, Y., Pan, Z., Hu, Y., Ding, C.: CPS-Det: an anchor-free based rotation detector for ship detection. Remote Sens. 13(11), 2208 (2021)

    Article  Google Scholar 

  12. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229. Springer (2020)

  13. Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J: Deformable detr: deformable transformers for end-to-end object detection (2021). ar**v:2010.04159

  14. Chi, C., Zhang, S., **ng, J., Lei, Z., Li, S.Z., Zou, X.: Selective refinement network for high performance face detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 8231–8238 (2019)

  15. Wang, J., Chen, K., Yang, S., Loy, C.C., Lin, D.: Region proposal by guided anchoring. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2960–2969 (2019)

  16. Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.: Single-shot refinement neural network for object detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4203–4212 (2018)

  17. Zhang, H., Wang, Y., Dayoub, F., Sunderhauf, N.: Varifocalnet: an iou-aware dense object detector. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8510–8519 (2021)

  18. Yang, Z., Liu, S., Hu, H., Wang, L., Lin, S.: Reppoints: point set representation for object detection. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9656–9665 (2019)

  19. Yang, Y., Li, M., Meng, B., Ren, J., Sun, D., Huang, Z.: Rethinking the aligned and misaligned features in one-stage object detection (2021). ar**v:2108.12176

  20. Wu, S., Li, X., Wang, X.: Iou-aware single-stage object detector for accurate localization. Image Vis. Comput. 97, 103911 (2020)

    Article  Google Scholar 

  21. Jiang, B., Luo, R., Mao, J., **ao, T., Jiang, Y.: Acquisition of localization confidence for accurate object detection. In: Proceedings of the European conference on computer vision (ECCV), pp. 784–799 (2018)

  22. Kim, K., Lee, H.S.: Probabilistic anchor assignment with iou prediction for object detection. In: ECCV (2020)

  23. Li, Y., Wang, S.: Har-net: joint learning of hybrid attention for single-stage object detection. IEEE Trans. Image Process. 29, 3092–3103 (2020)

    Article  MATH  Google Scholar 

  24. Li, X., Wang, W., Wu, L., Chen, S., Hu, X., Li, J., Tang, J., Yang, J.: Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection. Adv. Neural. Inf. Process. Syst. 33, 21002–21012 (2020)

    Google Scholar 

  25. Feng, C., Zhong, Y., Gao, Y., Scott, M.R., Huang, W.: Tood: task-aligned one-stage object detection. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3490–3499. IEEE Computer Society (2021)

  26. Li, X., Wang, W., Hu, X., Li, J., Tang, J., Yang, J.: Generalized focal loss v2: learning reliable localization quality estimation for dense object detection. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11627–11636 (2021)

  27. Tian, Z., Shen, C., Chen, H., He, T.: Fcos: a simple and strong anchor-free object detector. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020)

  28. Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9756–9765 (2020)

  29. Rezatofighi, S.H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I.D., Savarese, S.: Generalized intersection over union: A metric and a loss for bounding box regression. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 658–666 (2019)

  30. Lin, T.Y., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: ECCV (2014)

  31. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

  32. **e, S., Girshick, RB., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5987–5995 (2017)

  33. Deng, J., Dong, W., Socher, R., Li, LJ., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR (2009)

  34. Chen, K., Wang, J., Pang, J., Cao, Y., **ong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., et al.: Mmdetection: open mmlab detection toolbox and benchmark (2019). ar**v:1906.07155

  35. Goyal, P., Dollár, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 hour (2017). ar**v:1706.02677

  36. Li, H., Wu, Z., Zhu, C., **ong, C., Socher, R., Davis, L.S.: Learning from noisy anchors for one-stage object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10585–10594 (2020)

  37. Zhang, X., Wan, F., Liu, C., Ji, R., Ye, Q.: Freeanchor: learning to match anchors for visual object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019)

  38. Zhu, C., Chen, F., Shen, Z., Savvides, M.: Soft anchor-point object detection. In: European Conference on Computer Vision, pp. 91–107. Springer (2020)

  39. Ke, W., Zhang, T., Huang, Z., Ye, Q., Liu, J., Huang, D.: Multiple anchor learning for visual object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10203–10212 (2020)

  40. Zhu, B., Wang, J., Jiang, Z., Zong, F., Liu, S., Li, Z., Sun, J.: Autoassign: differentiable label assignment for dense object detection (2020). ar**v:2007.03496

Download references

Acknowledgements

The authors are very indebted to the reviewers for their critical comments and suggestions for the improvement in this paper.

Funding

This work was supported by National Key Research and Development Program of China (2021YFA1000102) and in part by the Grants from the National Natural Science Foundation of China (Nos. 61673396, 61976245).

Author information

Authors and Affiliations

Authors

Contributions

All authors read and approved the final manuscript. All authors’ individual contributions are as follows: ZL: investigation, writing—original draft. MS: conceptualization , supervision, validation. YS: methodology, data Curation. ZP: formal analysis, writing—review and editing.

Corresponding author

Correspondence to Mingwen Shao.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Z., Shao, M., Sun, Y. et al. Multi-task feature-aligned head in one-stage object detection. SIViP 17, 1345–1353 (2023). https://doi.org/10.1007/s11760-022-02342-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-022-02342-9

Keywords

Navigation