Attention-Based DCNs

Huang, Yan; Wang, Liang

doi:10.1007/978-981-99-0279-8_3

Yan Huang¹⁶ &
Liang Wang¹⁷

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

381 Accesses

Abstract

This chapter first provides a brief overview of attention-based Deep Cognitive Networks (DCNs). Then, representative models from two aspects in terms of hard attention and soft attention are introduced and analyzed, as well as their relation to important theories, computational models and experimental evidences in cognitive psychology. At last, this chapter is briefly summarized.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 37.44; Price includes VAT (Germany)

Softcover Book: EUR 48.14; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Although other forms of soft attention also somewhat impose the competition on the input features by the softmax function, they are more like normalization operations since each input feature is processed independently of the others.

References

Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 185–207 (2012)
Article Google Scholar
Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 27 (2014)
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. ar**v:1409.0473 (2014)
Google Scholar
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: Proceedings of the International Conference on Machine Learning. The Proceedings of Machine Learning Research, pp. 2048–2057 (2015)
Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems (2015)
Google Scholar
Lu, J., Yang, J., Batra, D., Parikh, D.: Answering. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Larochelle, H., Hinton, G.E.: Learning to combine foveal glimpses with a third-order Boltzmann machine. In: Advances in Neural Information Processing Systems, vol. 23 (2010)
Google Scholar
Lachter, J., Forster, K.I., Ruthruff, E.: Forty-five years after broadbent (1958): still no identification without attention. Psychol. Rev. 111(4), 880 (2004)
Article Google Scholar
Hoffman, J.E., Nelson, B.: Spatial selectivity in visual search. Percept. Psychophys. 30(3), 283–290 (1981)
Article Google Scholar
Anderson, C.H., Van Essen, D.C.: Shifter circuits: a computational strategy for dynamic aspects of visual processing. Proc. Natl. Acad. Sci. 84(17), 6297–6301 (1987)
Article Google Scholar
Gregor, K., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: Draw: a recurrent neural network for image generation. In: Proceedings of the International Conference on Machine Learning. The Proceedings of Machine Learning Research, pp. 1462–1471 (2015)
Google Scholar
Treisman, A.M.: Contextual cues in selective listening. Quart. J. Exp. Psychol. 12(4), 242–248 (1960)
Article Google Scholar
Kennett, S., Spence, C., Driver, J.: Visuo-tactile links in covert exogenous spatial attention remap across changes in unseen hand posture. Percept. Psychophys. 64(7), 1083–1094 (2002)
Article Google Scholar
Treisman, A.M., Gelade, G.: A feature-integration theory of attention. Cognit. Psychol. 12(1), 97–136 (1980)
Article Google Scholar
Desimone, R., Duncan, J.: Neural mechanisms of selective visual attention. Annu. Rev. Neurosci. 18(1), 193–222 (1995)
Article Google Scholar
Denil, M., Bazzani, L., Larochelle, H., de Freitas, N.: Learning where to attend with deep architectures for image tracking. Neural Comput. 24(8), 2151–2184 (2012)
Article MathSciNet MATH Google Scholar
Sutton, R. S., Barto, A.G., et al.: Introduction to reinforcement learning, vol. 135. MIT press, Cambridge (1998)
MATH Google Scholar
Li, Z., Yang, Y., Liu, X., Zhou, F., Wen, S., Xu, W.: Dynamic computational time for visual attention. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1199–1209 (2017)
Google Scholar
Elsayed, G., Kornblith, S., Le, Q.V.: Saccader: improving accuracy of hard attention models for vision. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Caicedo, J.C., Lazebnik, S.: Active object localization with deep reinforcement learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2488–2496 (2015)
Google Scholar
Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention. ar**v:1412.7755 (2014)
Google Scholar
Yeung, S., Russakovsky, O., Mori, G., Fei-Fei, L.: End-to-end learning of action detection from frame glimpses in videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2678–2687 (2016)
Google Scholar
Rao, Y., Lu, J., Zhou, J.: Attention-aware deep reinforcement learning for video face recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3931–3940 (2017)
Google Scholar
**ao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., Zhang, Z.: The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 842–850 (2015)
Google Scholar
Gonzalez-Garcia, A., Vezhnevets, A., Ferrari, V.: An active search strategy for efficient object class detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3022–3031 (2015)
Google Scholar
Song, C., Huang, Y., Ouyang, W., Wang, L.: Mask-guided contrastive attention model for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1179–1188 (2018)
Google Scholar
Chen, S., Tan, X., Wang, B., Hu, X.: Reverse attention for salient object detection. In: Proceedings of the European Conference on Computer Vision, pp. 234–250 (2018)
Google Scholar
Olshausen, B.A., Anderson, C.H., Van Essen, D.C.: A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J. Neurosci. 13(11), 4700–4719 (1993)
Article Google Scholar
Tang, C., Srivastava, N., Salakhutdinov, R.R.: Learning generative models with visual attention. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 27 (2014)
Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Article MathSciNet MATH Google Scholar
Sønderby, S.K., Sønderby, C.K., Maaløe, L., Winther, O.: Recurrent spatial transformer networks. ar**v:1509.05329 (2015)
Google Scholar
Lohit, S., Wang, Q., Turaga, P.: Temporal transformer networks: joint learning of invariant and discriminative time war**. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12426–12435 (2019)
Google Scholar
Kim, T.H., Sajjadi, M.S., Hirsch, M., Scholkopf, B.: Spatio-temporal transformer network for video restoration. In: Proceedings of the European Conference on Computer Vision, pp. 106–122 (2018)
Google Scholar
Haque, A., Peng, B., Luo, Z., Alahi, A., Yeung, S., Fei-Fei, L.: Towards viewpoint invariant 3d human pose estimation. In: Proceedings of the European Conference on Computer Vision, pp. 160–177. Springer, Berlin (2016)
Google Scholar
Li, D., Chen, X., Zhang, Z., Huang, K.: Learning deep context-aware features over body and latent parts for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 384–393 (2017)
Google Scholar
Recasens, A., Kellnhofer, P., Stent, S., Matusik, W., Torralba, A.: Learning to zoom: a saliency-based sampling layer for neural networks. In: Proceedings of the European Conference on Computer Vision, pp. 51–66 (2018)
Google Scholar
Ren, M., Zemel, R.S.: End-to-end instance segmentation with recurrent attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6656–6664 (2017)
Google Scholar
Zhao, B., Wu, X., Feng, J., Peng, Q., Yan, S.: Diversified visual attention networks for fine-grained object classification. IEEE Trans. Multimedia 19(6), 1245–1256 (2017)
Article Google Scholar
Li, J., Wei, Y., Liang, X., Dong, J., Xu, T., Feng, J., Yan, S.: Attentive contexts for object detection. IEEE Trans. Multimedia 19(5), 944–954 (2016)
Article Google Scholar
Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., Courville, A.: Describing videos by exploiting temporal structure. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4507–4515 (2015)
Google Scholar
Sharma, S., Kiros, R., Salakhutdinov, R.: Action recognition using visual attention. ar**v:1511.04119 (2015)
Google Scholar
You, Q., **, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4651–4659 (2016)
Google Scholar
Yu, L., Lin, Z., Shen, X., Yang, J., Lu, X., Bansal, M., Berg, T.L.: Mattnet: modular attention network for referring expression comprehension. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1307–1315 (2018)
Google Scholar
Lu, P., Ji, L., Zhang, W., Duan, N., Zhou, M., Wang, J.: R-VQA: learning visual relation facts with semantic attention for visual question answering. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1880–1889 (2018)
Google Scholar
Driver, J., Spence, C.: Crossmodal attention. Curr. Opin. Neurobiol. 8(2), 245–253 (1998)
Article Google Scholar
Tan, H., Bansal, M.: LXMERT: learning cross-modality encoder representations from transformers. ar**v:1908.07490 (2019)
Google Scholar
Yin, W., Schütze, H., **ang, B., Zhou, B.: ABCNN: attention-based convolutional neural network for modeling sentence pairs. Trans. Assoc. Comput. Linguist. 4, 259–272 (2016)
Article Google Scholar
Lee, K.-H., Chen, X., Hua, G., Hu, H., He, X.: Stacked cross attention for image-text matching. In: Proceedings of the European Conference on Computer Vision, pp. 201–216 (2018)
Google Scholar
Huang, Y., Wang, W., Wang, L.: Instance-aware image and sentence matching with selective multimodal LSTM. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2310–2318 (2017)
Google Scholar
Zhu, J., Yang, H., Liu, N., Kim, M., Zhang, W., Yang, M.-H.: Online multi-object tracking with dual matching attention networks. In: Proceedings of the European Conference on Computer Vision, pp. 366–382 (2018)
Google Scholar
Lu, X., Wang, W., Ma, C., Shen, J., Shao, L., Porikli, F.: See more, know more: unsupervised video object segmentation with co-attention siamese networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3623–3632 (2019)
Google Scholar
Chen, L., Zhang, H., **ao, J., Nie, L., Shao, J., Liu, W., Chua, T.-S.: SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5659–5667 (2017)
Google Scholar
Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Li, X., Wu, J., Lin, Z., Liu, H., Zha, H.: Recurrent squeeze-and-excitation context aggregation net for single image deraining. In: Proceedings of the European Conference on Computer Vision, pp. 254–269 (2018)
Google Scholar
Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722 (2021)
Google Scholar
Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 510–519 (2019)
Google Scholar
Chen, Y., Kalantidis, Y., Li, J., Yan, S., Feng, J.: Aˆ 2-nets: double attention networks. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Dai, T., Cai, J., Zhang, Y., **a, S.-T., Zhang, L.: Second-order attention network for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11065–11074 (2019)
Google Scholar
Choi, M., Kim, H., Han, B., Xu, N., Lee, K.M.: Channel attention is all you need for video frame interpolation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 10663–10671 (2020)
Google Scholar
Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3146–3154 (2019)
Google Scholar
Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. ar**v:1803.02155 (2018)
Google Scholar
Wang, S., Li, B.Z., Khabsa, M., Fang, H., Ma, H.: Linformer: self-attention with linear complexity. ar**v:2006.04768 (2020)
Google Scholar
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Google Scholar
Pham, N.-Q., Nguyen, T.-S., Niehues, J., Müller, M., Stüker, S., Waibel, A.: Very deep self-attention networks for end-to-end speech recognition. ar**v:1904.13377 (2019)
Google Scholar
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: Proceedings of the International Conference on Machine Learning. The Proceedings of Machine Learning Research, pp. 7354–7363 (2019)
Google Scholar
Wang, L., Huang, Y., Hou, Y., Zhang, S., Shan, J.: Graph attention convolution for point cloud semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10296–10305 (2019)
Google Scholar
Sun, Y., Wang, Y., Liu, Z., Siegel, J., Sarma, S.: Pointgrow: autoregressively learned point cloud generation with self-attention. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 61–70 (2020)
Google Scholar
Zheng, C., Fan, X., Wang, C., Qi, J.: GMAN: a graph multi-attention network for traffic prediction. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 1234–1241 (2020)
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. ar**v:1810.04805 (2018)
Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: transformers for image recognition at scale. ar**v:2010.11929 (2020)
Google Scholar
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: Proceedings of the International Conference on Machine Learning. The Proceedings of Machine Learning Research, pp. 8748–8763 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Bei**g, China
Yan Huang
Institute of Automation, Chinese Academy of Sciences, Bei**g, China
Liang Wang

Authors

Yan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Liang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Huang, Y., Wang, L. (2023). Attention-Based DCNs. In: Deep Cognitive Networks . SpringerBriefs in Computer Science. Springer, Singapore. https://doi.org/10.1007/978-981-99-0279-8_3

Download citation

DOI: https://doi.org/10.1007/978-981-99-0279-8_3
Published: 17 January 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-0278-1
Online ISBN: 978-981-99-0279-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics