Log in

RETRACTED ARTICLE: ICDN: integrating consistency and difference networks by transformer for multimodal sentiment analysis

  • Published:
Applied Intelligence Aims and scope Submit manuscript

This article was retracted on 10 July 2023

This article has been updated

Abstract

The sentiment of human language is usually reflected through multimodal forms such as natural language, facial expression, and voice intonation. However, the previous research methods uniformly treated different modalities of time series alignment and ignored the missing modal information fragments. The main challenge is the partial absence of multimodal information. In this work, the integrating consistency and difference networks(ICDN) is firstly proposed to model modalities interaction through map** and generalization learning, which includes a special cross-modal Transformer designed to map other modalities to the target modality. Then, the unimodal sentiment labels are obtained through self-supervision to guide the final sentiment analysis. Compared with other popular multimodal sentiment analysis methods, we obtain better sentiment classification results on CMU-MOSI and CMU-MOSEI benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Change history

References

  1. Gibson KR, Ingold TE (1993) Tools, language and cognition in human evolution. In: Revised versions of the chapters in this volume were presented at the symposium,” Tools, language and intelligence: Evolutionary implications,” held in Cascais. Cambridge University Press, Portugal

  2. Zadeh A, Chen M, Poria S, Cambria E, Morency L-P (2017) Tensor fusion network for multimodal sentiment analysis. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp 1103–1114

  3. Zadeh A, Liang PP, Mazumder N, Poria S, Cambria E, Morency L-P (2018) Memory fusion network for multi-view sequential learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 32

  4. Liu Z, Shen Y, Lakshminarasimhan VB, Liang PP, Zadeh AB, Morency L-P (2018) Efficient low-rank multimodal fusion with modality-specific factors. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 2247–2256

  5. Mai S, Hu H, ** real-time streaming transformer transducer for speech recognition on large-scale dataset. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 5904–5908

  6. Dong L, Xu S, Xu B (2018) Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 5884–5888

  7. Gulati A, Qin J, Chiu C-C, Parmar N, Zhang Y, Yu J, Han W, Wang S, Zhang Z, Wu Y et al (2020) Conformer: Convolution-augmented transformer for speech recognition. ar**v:2005.08100

  8. Tsai Y-HH, Bai S, Liang PP, Kolter JZ, Morency L-P, Salakhutdinov R (2019) Multimodal transformer for unaligned multimodal language sequences. In: Proceedings of the conference. Association for Computational Linguistics. Meeting, vol 2019. NIH Public Access, p 6558

  9. Cheng J, Fostiropoulos I, Boehm B, Soleymani M (2021) Multimodal phased transformer for sentiment analysis. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 2447–2458

  10. Liu W, Mei T, Zhang Y, Che C, Luo J (2015) Multi-task deep visual-semantic embedding for video thumbnail selection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3707–3715

  11. Zhang W, Li R, Zeng T, Sun Q, Kumar S, Ye J, Ji S (2015) Deep model based transfer and multi-task learning for biological image analysis. In: 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015. Association for Computing Machinery, pp 1475–1484

  12. Akhtar MS, Chauhan D, Ghosal D, Poria S, Ekbal A, Bhattacharyya P (2019) Multi-task learning for multi-modal emotion recognition and sentiment analysis. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp 370– 379

  13. Yu W, Xu H, Meng F, Zhu Y, Ma Y, Wu J, Zou J, Yang K (2020) Ch-sims: A chinese multimodal sentiment analysis dataset with fine-grained annotation of modality. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 3718–3727

  14. Ju X, Zhang D, **ao R, Li J, Li S, Zhang M, Zhou G (2021) Joint multi-modal aspect-sentiment analysis with auxiliary cross-modal relation detection. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 4395–4405

  15. Zhang B, Yu F, Gao Y, Ren T, Wu G (2021) Joint learning for relationship and interaction analysis in video with multimodal feature fusion. In: Proceedings of the 29th ACM International Conference on Multimedia, pp 4848–4852

  16. Pham H, Liang PP, Manzini T, Morency L-P, Póczos B (2019) Found in translation: Learning robust joint representations by cyclic translations between modalities. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 6892–6899

  17. Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  18. Degottex G, Kane J, Drugman T, Raitio T, Scherer S (2014) Covarep–a collaborative voice analysis repository for speech technologies. In: 2014 ieee international conference on acoustics, speech and signal processing (icassp). IEEE, pp 960–964

  19. Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: ICLR (Poster)

Download references

Acknowledgements

This work was supported in part by the National Social Science Foundation under Award 19BYY076, in part Key R & D project of Shandong Province 2019JZZY010129, and in part by the Shandong Provincial Social Science Planning Project under Award 19BJCJ51, Award 18CXWJ01, and Award 18BJYJ04.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Lei Shi, Peiyu Liu or Liancheng Xu.

Ethics declarations

Conflict of Interests

The authors declare that there is no conflict of interests with anybody or any institution regarding the publication of this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article has been retracted. Please see the retraction notice for more detail: https://doi.org/10.1007/s10489-023-04869-x"

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Q., Shi, L., Liu, P. et al. RETRACTED ARTICLE: ICDN: integrating consistency and difference networks by transformer for multimodal sentiment analysis. Appl Intell 53, 16332–16345 (2023). https://doi.org/10.1007/s10489-022-03343-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03343-4

Keywords

Navigation