Attention-Based DCNs

  • Chapter
  • First Online:
Deep Cognitive Networks

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

  • 381 Accesses

Abstract

This chapter first provides a brief overview of attention-based Deep Cognitive Networks (DCNs). Then, representative models from two aspects in terms of hard attention and soft attention are introduced and analyzed, as well as their relation to important theories, computational models and experimental evidences in cognitive psychology. At last, this chapter is briefly summarized.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now
Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 37.44
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 48.14
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Although other forms of soft attention also somewhat impose the competition on the input features by the softmax function, they are more like normalization operations since each input feature is processed independently of the others.

References

  1. Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 185–207 (2012)

    Article  Google Scholar 

  2. Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 27 (2014)

    Google Scholar 

  3. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. ar**v:1409.0473 (2014)

    Google Scholar 

  4. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: Proceedings of the International Conference on Machine Learning. The Proceedings of Machine Learning Research, pp. 2048–2057 (2015)

    Google Scholar 

  5. Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems (2015)

    Google Scholar 

  6. Lu, J., Yang, J., Batra, D., Parikh, D.: Answering. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 29 (2016)

    Google Scholar 

  7. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

    Google Scholar 

  8. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  9. Larochelle, H., Hinton, G.E.: Learning to combine foveal glimpses with a third-order Boltzmann machine. In: Advances in Neural Information Processing Systems, vol. 23 (2010)

    Google Scholar 

  10. Lachter, J., Forster, K.I., Ruthruff, E.: Forty-five years after broadbent (1958): still no identification without attention. Psychol. Rev. 111(4), 880 (2004)

    Article  Google Scholar 

  11. Hoffman, J.E., Nelson, B.: Spatial selectivity in visual search. Percept. Psychophys. 30(3), 283–290 (1981)

    Article  Google Scholar 

  12. Anderson, C.H., Van Essen, D.C.: Shifter circuits: a computational strategy for dynamic aspects of visual processing. Proc. Natl. Acad. Sci. 84(17), 6297–6301 (1987)

    Article  Google Scholar 

  13. Gregor, K., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: Draw: a recurrent neural network for image generation. In: Proceedings of the International Conference on Machine Learning. The Proceedings of Machine Learning Research, pp. 1462–1471 (2015)

    Google Scholar 

  14. Treisman, A.M.: Contextual cues in selective listening. Quart. J. Exp. Psychol. 12(4), 242–248 (1960)

    Article  Google Scholar 

  15. Kennett, S., Spence, C., Driver, J.: Visuo-tactile links in covert exogenous spatial attention remap across changes in unseen hand posture. Percept. Psychophys. 64(7), 1083–1094 (2002)

    Article  Google Scholar 

  16. Treisman, A.M., Gelade, G.: A feature-integration theory of attention. Cognit. Psychol. 12(1), 97–136 (1980)

    Article  Google Scholar 

  17. Desimone, R., Duncan, J.: Neural mechanisms of selective visual attention. Annu. Rev. Neurosci. 18(1), 193–222 (1995)

    Article  Google Scholar 

  18. Denil, M., Bazzani, L., Larochelle, H., de Freitas, N.: Learning where to attend with deep architectures for image tracking. Neural Comput. 24(8), 2151–2184 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  19. Sutton, R. S., Barto, A.G., et al.: Introduction to reinforcement learning, vol. 135. MIT press, Cambridge (1998)

    MATH  Google Scholar 

  20. Li, Z., Yang, Y., Liu, X., Zhou, F., Wen, S., Xu, W.: Dynamic computational time for visual attention. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1199–1209 (2017)

    Google Scholar 

  21. Elsayed, G., Kornblith, S., Le, Q.V.: Saccader: improving accuracy of hard attention models for vision. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  22. Caicedo, J.C., Lazebnik, S.: Active object localization with deep reinforcement learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2488–2496 (2015)

    Google Scholar 

  23. Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention. ar**v:1412.7755 (2014)

    Google Scholar 

  24. Yeung, S., Russakovsky, O., Mori, G., Fei-Fei, L.: End-to-end learning of action detection from frame glimpses in videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2678–2687 (2016)

    Google Scholar 

  25. Rao, Y., Lu, J., Zhou, J.: Attention-aware deep reinforcement learning for video face recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3931–3940 (2017)

    Google Scholar 

  26. **ao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., Zhang, Z.: The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 842–850 (2015)

    Google Scholar 

  27. Gonzalez-Garcia, A., Vezhnevets, A., Ferrari, V.: An active search strategy for efficient object class detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3022–3031 (2015)

    Google Scholar 

  28. Song, C., Huang, Y., Ouyang, W., Wang, L.: Mask-guided contrastive attention model for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1179–1188 (2018)

    Google Scholar 

  29. Chen, S., Tan, X., Wang, B., Hu, X.: Reverse attention for salient object detection. In: Proceedings of the European Conference on Computer Vision, pp. 234–250 (2018)

    Google Scholar 

  30. Olshausen, B.A., Anderson, C.H., Van Essen, D.C.: A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J. Neurosci. 13(11), 4700–4719 (1993)

    Article  Google Scholar 

  31. Tang, C., Srivastava, N., Salakhutdinov, R.R.: Learning generative models with visual attention. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 27 (2014)

    Google Scholar 

  32. Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  33. Sønderby, S.K., Sønderby, C.K., Maaløe, L., Winther, O.: Recurrent spatial transformer networks. ar**v:1509.05329 (2015)

    Google Scholar 

  34. Lohit, S., Wang, Q., Turaga, P.: Temporal transformer networks: joint learning of invariant and discriminative time war**. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12426–12435 (2019)

    Google Scholar 

  35. Kim, T.H., Sajjadi, M.S., Hirsch, M., Scholkopf, B.: Spatio-temporal transformer network for video restoration. In: Proceedings of the European Conference on Computer Vision, pp. 106–122 (2018)

    Google Scholar 

  36. Haque, A., Peng, B., Luo, Z., Alahi, A., Yeung, S., Fei-Fei, L.: Towards viewpoint invariant 3d human pose estimation. In: Proceedings of the European Conference on Computer Vision, pp. 160–177. Springer, Berlin (2016)

    Google Scholar 

  37. Li, D., Chen, X., Zhang, Z., Huang, K.: Learning deep context-aware features over body and latent parts for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 384–393 (2017)

    Google Scholar 

  38. Recasens, A., Kellnhofer, P., Stent, S., Matusik, W., Torralba, A.: Learning to zoom: a saliency-based sampling layer for neural networks. In: Proceedings of the European Conference on Computer Vision, pp. 51–66 (2018)

    Google Scholar 

  39. Ren, M., Zemel, R.S.: End-to-end instance segmentation with recurrent attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6656–6664 (2017)

    Google Scholar 

  40. Zhao, B., Wu, X., Feng, J., Peng, Q., Yan, S.: Diversified visual attention networks for fine-grained object classification. IEEE Trans. Multimedia 19(6), 1245–1256 (2017)

    Article  Google Scholar 

  41. Li, J., Wei, Y., Liang, X., Dong, J., Xu, T., Feng, J., Yan, S.: Attentive contexts for object detection. IEEE Trans. Multimedia 19(5), 944–954 (2016)

    Article  Google Scholar 

  42. Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., Courville, A.: Describing videos by exploiting temporal structure. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4507–4515 (2015)

    Google Scholar 

  43. Sharma, S., Kiros, R., Salakhutdinov, R.: Action recognition using visual attention. ar**v:1511.04119 (2015)

    Google Scholar 

  44. You, Q., **, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4651–4659 (2016)

    Google Scholar 

  45. Yu, L., Lin, Z., Shen, X., Yang, J., Lu, X., Bansal, M., Berg, T.L.: Mattnet: modular attention network for referring expression comprehension. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1307–1315 (2018)

    Google Scholar 

  46. Lu, P., Ji, L., Zhang, W., Duan, N., Zhou, M., Wang, J.: R-VQA: learning visual relation facts with semantic attention for visual question answering. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1880–1889 (2018)

    Google Scholar 

  47. Driver, J., Spence, C.: Crossmodal attention. Curr. Opin. Neurobiol. 8(2), 245–253 (1998)

    Article  Google Scholar 

  48. Tan, H., Bansal, M.: LXMERT: learning cross-modality encoder representations from transformers. ar**v:1908.07490 (2019)

    Google Scholar 

  49. Yin, W., Schütze, H., **ang, B., Zhou, B.: ABCNN: attention-based convolutional neural network for modeling sentence pairs. Trans. Assoc. Comput. Linguist. 4, 259–272 (2016)

    Article  Google Scholar 

  50. Lee, K.-H., Chen, X., Hua, G., Hu, H., He, X.: Stacked cross attention for image-text matching. In: Proceedings of the European Conference on Computer Vision, pp. 201–216 (2018)

    Google Scholar 

  51. Huang, Y., Wang, W., Wang, L.: Instance-aware image and sentence matching with selective multimodal LSTM. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2310–2318 (2017)

    Google Scholar 

  52. Zhu, J., Yang, H., Liu, N., Kim, M., Zhang, W., Yang, M.-H.: Online multi-object tracking with dual matching attention networks. In: Proceedings of the European Conference on Computer Vision, pp. 366–382 (2018)

    Google Scholar 

  53. Lu, X., Wang, W., Ma, C., Shen, J., Shao, L., Porikli, F.: See more, know more: unsupervised video object segmentation with co-attention siamese networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3623–3632 (2019)

    Google Scholar 

  54. Chen, L., Zhang, H., **ao, J., Nie, L., Shao, J., Liu, W., Chua, T.-S.: SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5659–5667 (2017)

    Google Scholar 

  55. Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2017)

    Google Scholar 

  56. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  57. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  58. Li, X., Wu, J., Lin, Z., Liu, H., Zha, H.: Recurrent squeeze-and-excitation context aggregation net for single image deraining. In: Proceedings of the European Conference on Computer Vision, pp. 254–269 (2018)

    Google Scholar 

  59. Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722 (2021)

    Google Scholar 

  60. Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 510–519 (2019)

    Google Scholar 

  61. Chen, Y., Kalantidis, Y., Li, J., Yan, S., Feng, J.: Aˆ 2-nets: double attention networks. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 31 (2018)

    Google Scholar 

  62. Dai, T., Cai, J., Zhang, Y., **a, S.-T., Zhang, L.: Second-order attention network for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11065–11074 (2019)

    Google Scholar 

  63. Choi, M., Kim, H., Han, B., Xu, N., Lee, K.M.: Channel attention is all you need for video frame interpolation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 10663–10671 (2020)

    Google Scholar 

  64. Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3146–3154 (2019)

    Google Scholar 

  65. Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. ar**v:1803.02155 (2018)

    Google Scholar 

  66. Wang, S., Li, B.Z., Khabsa, M., Fang, H., Ma, H.: Linformer: self-attention with linear complexity. ar**v:2006.04768 (2020)

    Google Scholar 

  67. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)

    Google Scholar 

  68. Pham, N.-Q., Nguyen, T.-S., Niehues, J., Müller, M., Stüker, S., Waibel, A.: Very deep self-attention networks for end-to-end speech recognition. ar**v:1904.13377 (2019)

    Google Scholar 

  69. Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: Proceedings of the International Conference on Machine Learning. The Proceedings of Machine Learning Research, pp. 7354–7363 (2019)

    Google Scholar 

  70. Wang, L., Huang, Y., Hou, Y., Zhang, S., Shan, J.: Graph attention convolution for point cloud semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10296–10305 (2019)

    Google Scholar 

  71. Sun, Y., Wang, Y., Liu, Z., Siegel, J., Sarma, S.: Pointgrow: autoregressively learned point cloud generation with self-attention. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 61–70 (2020)

    Google Scholar 

  72. Zheng, C., Fan, X., Wang, C., Qi, J.: GMAN: a graph multi-attention network for traffic prediction. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 1234–1241 (2020)

    Google Scholar 

  73. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. ar**v:1810.04805 (2018)

    Google Scholar 

  74. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: transformers for image recognition at scale. ar**v:2010.11929 (2020)

    Google Scholar 

  75. Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: Proceedings of the International Conference on Machine Learning. The Proceedings of Machine Learning Research, pp. 8748–8763 (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Huang, Y., Wang, L. (2023). Attention-Based DCNs. In: Deep Cognitive Networks . SpringerBriefs in Computer Science. Springer, Singapore. https://doi.org/10.1007/978-981-99-0279-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-0279-8_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-0278-1

  • Online ISBN: 978-981-99-0279-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation