Object Detector with Recursive Feature Pyramid and Key Content-Only Attention

Lu, Yuchao; Zhang, Tao; **, Jiali; Zhang, Li

doi:10.1007/978-3-031-15934-3_20

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13531))

Included in the following conference series:

International Conference on Artificial Neural Networks

1863 Accesses

Abstract

To detect objects, human visual perception focuses on the key content of interest and then transmits high-level semantic information through feedback connections, selectively enhancing and suppressing neuronal activation. Inspired by the human visual system, our detector incorporates the recursive feature pyramid in the backbone to integrate the feedback information of the FPN into the backbone network so that the features of the secondary training of the backbone network can be better adapted to the detection task. Furthermore, we propose a key content-only attention mechanism to seek the balance between accuracy and efficiency, which adopts the attention configuration of the key content-only term with deformable convolution to achieve the best accuracy and efficiency trade-off. Both of these can improve our baseline AP by > 4%, and combining them further enhances the performance of our detector. On COCO test dev, our detector achieves 45.1% box AP with ResNet-50.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 42.79; Price includes VAT (Germany)

Softcover Book: EUR 53.49; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

SAFPN: a full semantic feature pyramid network for object detection

Article 28 September 2023

L-Net: lightweight and fast object detector-based ShuffleNetV2

Article 19 June 2021

Scale-Insensitive Object Detection via Attention Feature Pyramid Transformer Network

Article 19 October 2021

References

Beck, D.M., Kastner, S.: Top-down and bottom-up mechanisms in biasing competition in the human brain. Vis. Res. 49(10), 1154–1165 (2009)
Article Google Scholar
Cai, Z., Vasconcelos, N.: Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
Google Scholar
Cao, C., et al.: Look and think twice: capturing top-down visual attention with feedback convolutional neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2956–2964 (2015)
Google Scholar
Cao, J., Cholakkal, H., Anwer, R.M., Khan, F.S., Pang, Y., Shao, L.: D2det: towards high quality object detection and instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11485–11494 (2020)
Google Scholar
Chen, K., et al.: Mmdetection: open MMLAB detection toolbox and benchmark. ar**v preprint ar**v:1906.07155 (2019)
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. ar**v preprint ar**v:1706.05587 (2017)
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818 (2018)
Google Scholar
Chiasi, G., Lin, T.Y., Le, Q.V.N.: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE Computer Vision and Pattern Recognition, pp. 7029–7038
Google Scholar
Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)
Google Scholar
Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q.V., Salakhutdinov, R.: Transformer-xl: attentive language models beyond a fixed-length context. ar**v preprint ar**v:1901.02860 (2019)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Desimone, R.: Visual attention mediated by biased competition in extrastriate visual cortex. Philos. Trans. Roy. Soc. Lond. Ser. B: Biol. Sci. 353(1373), 1245–1255 (1998)
Article Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Jain, S., Wallace, B.C.: Attention is not explanation. ar**v preprint ar**v:1902.10186 (2019)
Jiang, C., Xu, H., Zhang, W., Liang, X., Li, Z.: Sp-nas: serial-to-parallel backbone search for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11863–11872 (2020)
Google Scholar
Li, J., Raventos, A., Bhargava, A., Tagawa, T., Gaidon, A.: Learning to fuse things and stuff. ar**v preprint ar**v:1812.01192 (2018)
Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 510–519 (2019)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)
Google Scholar
Liu, W., et al.: SSD: single shot multiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)
Google Scholar
Tan, M., Pang, R., Le, Q.V.: Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
Google Scholar
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Google Scholar
Xu, H., Yao, L., Zhang, W., Liang, X., Li, Z.: Auto-fpn: automatic network architecture adaptation for object detection beyond classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6649–6658 (2019)
Google Scholar
Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. ar**v preprint ar**v:1611.01578 (2016)

Download references

Author information

Authors and Affiliations

School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214122, China
Yuchao Lu, Tao Zhang, Jiali ** & Li Zhang

Authors

Yuchao Lu
View author publications
You can also search for this author in PubMed Google Scholar
Tao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jiali **
View author publications
You can also search for this author in PubMed Google Scholar
Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Zhang .

Editor information

Editors and Affiliations

University of the West of England, Bristol, UK
Elias Pimenidis
Lancaster University, Lancaster, UK
Plamen Angelov
Digital Innovation, Teeside University, Middlesbrough, UK
Chrisina Jayne
Democritus University of Thrace, Xanthi, Greece
Antonios Papaleonidas
The University of the West of England, Bristol, UK
Mehmet Aydin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lu, Y., Zhang, T., **, J., Zhang, L. (2022). Object Detector with Recursive Feature Pyramid and Key Content-Only Attention. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13531. Springer, Cham. https://doi.org/10.1007/978-3-031-15934-3_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-15934-3_20
Published: 15 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15933-6
Online ISBN: 978-3-031-15934-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Object Detector with Recursive Feature Pyramid and Key Content-Only Attention

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

SAFPN: a full semantic feature pyramid network for object detection

L-Net: lightweight and fast object detector-based ShuffleNetV2

Scale-Insensitive Object Detection via Attention Feature Pyramid Transformer Network

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Object Detector with Recursive Feature Pyramid and Key Content-Only Attention

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

SAFPN: a full semantic feature pyramid network for object detection

L-Net: lightweight and fast object detector-based ShuffleNetV2

Scale-Insensitive Object Detection via Attention Feature Pyramid Transformer Network

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation