Emotion Aware Reinforcement Network for Visual Storytelling

Li, **n; Cai, Hanqing; Jiang, Tianling; Liu, Chun**; Ji, Yi

doi:10.1007/978-3-031-15931-2_3

Chun** Liu¹² &
…
Yi Ji¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13530))

Included in the following conference series:

International Conference on Artificial Neural Networks

2393 Accesses

Abstract

Visual storytelling is the task of generating a sequence of human-like sentences (i.e. story) for an ordered stream of images. Unlike traditional image captioning, the story contains not only factual descriptions but also concepts and objects that do not explicitly appear in the input images. Recent works utilize either end-to-end or multi-stage frameworks to produce more relevant and coherent stories but usually ignore latent emotional information. In this work, to generate an affective story, we propose an Emotion Aware Reinforcement Network for VIsual StoryTelling (EARN-VIST). Specifically in our network, lexicon-based attention is leveraged to encourage the model to pay more attention to the emotional words. Then we apply two emotional consistency reinforcement learning rewards using an emotion classifier and commonsense transformer respectively to find the gap between generated story and human-labeled story so as to refine the generation process. Experimental results on the VIST dataset and human evaluation demonstrate that our model outperforms most of the cutting-edge models across multiple evaluation metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Hierarchical Approach for Visual Storytelling Using Image Description

Human-Like Storyteller: A Hierarchical Network with Gated Memory for Visual Storytelling

Adversarial Learning for Visual Storytelling with Sense Group Partition

Notes

1.
We build this emotion vocabulary in terms of the NRC Affect Intensity Lexicon [15]. There are four emotion categories in emotion vocabulary, which are anger, fear, sadness, happiness.
2.
https://github.com/utterworks/fast-bert.
3.
https://github.com/atcbosselut/comet-commonsense.
4.
Note that gold story represents the manually annotated story in VIST dataset.
5.
https://visionandlanguage.net/VIST/.
6.
https://competitions.codalab.org/competitions/17751.

References

Banerjee, S., Lavie, A.: Meteor: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)
Google Scholar
Bosselut, A., Rashkin, H., Sap, M., Malaviya, C., Celikyilmaz, A., Choi, Y.: Comet: commonsense transformers for automatic knowledge graph construction. ar**v preprint ar**v:1906.05317 (2019)
Brahman, F., Chaturvedi, S.: Modeling protagonist emotions for emotion-aware storytelling. ar**v preprint ar**v:2010.06822 (2020)
Chen, H., Huang, Y., Takamura, H., Nakayama, H.: Commonsense knowledge aware concept selection for diverse and informative visual storytelling. ar**v preprint ar**v:2102.02963 (2021)
Gu, S., Wang, W., Wang, F., Huang, J.H.: Neuromodulator and emotion biomarker for stress induced mental disorders. Neural Plasticity 2016 (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hsu, C.C., et al.: Knowledge-enriched visual storytelling. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 7952–7960 (2020)
Google Scholar
Hsu, C.Y., Chu, Y.W., Huang, T.H., Ku, L.W.: Plot and rework: modeling storylines for visual storytelling. ar**v preprint ar**v:2105.06950 (2021)
Hu, J., Cheng, Y., Gan, Z., Liu, J., Gao, J., Neubig, G.: What makes a good story? designing composite rewards for visual storytelling. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 7969–7976 (2020)
Google Scholar
Huang, Q., Gan, Z., Celikyilmaz, A., Wu, D., Wang, J., He, X.: Hierarchically structured reinforcement learning for topically coherent visual story generation. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 8465–8472 (2019)
Google Scholar
Huang, T.H., et al.: Visual storytelling. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1233–1239 (2016)
Google Scholar
Jung, Y., Kim, D., Woo, S., Kim, K., Kim, S., Kweon, I.S.: Hide-and-tell: learning to bridge photo streams for visual storytelling. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 11213–11220 (2020)
Google Scholar
Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out, pp. 74–81 (2004)
Google Scholar
Mohammad, S., Bravo-Marquez, F., Salameh, M., Kiritchenko, S.: Semeval-2018 task 1: affect in tweets. In: Proceedings of the 12th International Workshop on Semantic Evaluation, pp. 1–17 (2018)
Google Scholar
Mohammad, S.M.: Word affect intensities. ar**v preprint ar**v:1704.08798 (2017)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar
Qi, M., Qin, J., Huang, D., Shen, Z., Yang, Y., Luo, J.: Latent memory-augmented graph transformer for visual storytelling. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4892–4901 (2021)
Google Scholar
Ranzato, M., Chopra, S., Auli, M., Zaremba, W.: Sequence level training with recurrent neural networks. ar**v preprint ar**v:1511.06732 (2015)
Sap, M., et al.: Atomic: An atlas of machine commonsense for if-then reasoning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3027–3035 (2019)
Google Scholar
Speer, R., Chin, J., Havasi, C.: Conceptnet 5.5: an open multilingual graph of general knowledge. In: Thirty-first AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Vedantam, R., Lawrence Zitnick, C., Parikh, D.: Cider: consensus-based image description evaluation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4566–4575 (2015)
Google Scholar
Wang, R., Wei, Z., Li, P., Zhang, Q., Huang, X.: Storytelling from an image stream using scene graphs. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 9185–9192 (2020)
Google Scholar
Wang, X., Chen, W., Wang, Y.F., Wang, W.Y.: No metrics are perfect: adversarial reward learning for visual storytelling. ar**v preprint ar**v:1804.09160 (2018)
Xu, C., Yang, M., Li, C., Shen, Y., Ao, X., Xu, R.: Imagine, reason and write: visual storytelling with graph knowledge and relational reasoning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3022–3029 (2021)
Google Scholar

Download references

Acknowledgements

Supported by National Natural Science Foundation of China Nos 61972059, 61773272, 61602332; Natural Science Foundation of the Jiangsu Higher Education Institutions of China No 19KJA230001, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University No 93K172016K08; the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).

Author information

Authors and Affiliations

School of Computer Science and Technology, Soochow University, Suzhou, China
** Liu & Yi Ji

Authors

**n Li
View author publications
You can also search for this author in PubMed Google Scholar
Hanqing Cai
View author publications
You can also search for this author in PubMed Google Scholar
Tianling Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Chun** Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yi Ji
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Ji .

Editor information

Editors and Affiliations

University of the West of England, Bristol, UK
Elias Pimenidis
Lancaster University, Lancaster, UK
Plamen Angelov
Digital Innovation, Teeside University, Middlesbrough, UK
Chrisina Jayne
Democritus University of Thrace, Xanthi, Greece
Antonios Papaleonidas
The University of the West of England, Bristol, UK
Mehmet Aydin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, X., Cai, H., Jiang, T., Liu, C., Ji, Y. (2022). Emotion Aware Reinforcement Network for Visual Storytelling. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13530. Springer, Cham. https://doi.org/10.1007/978-3-031-15931-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-15931-2_3
Published: 07 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15930-5
Online ISBN: 978-3-031-15931-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Emotion Aware Reinforcement Network for Visual Storytelling

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Hierarchical Approach for Visual Storytelling Using Image Description

Human-Like Storyteller: A Hierarchical Network with Gated Memory for Visual Storytelling

Adversarial Learning for Visual Storytelling with Sense Group Partition

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Emotion Aware Reinforcement Network for Visual Storytelling

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Hierarchical Approach for Visual Storytelling Using Image Description

Human-Like Storyteller: A Hierarchical Network with Gated Memory for Visual Storytelling

Adversarial Learning for Visual Storytelling with Sense Group Partition

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation