DXVNet Multimodal Classification Recognition Network Based on Dissymmetric Bidirectional D-GRU

  • Conference paper
  • First Online:
Proceeding of 2022 International Conference on Wireless Communications, Networking and Applications (WCNA 2022) (WCNA 2022)

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 1059))

  • 257 Accesses

Abstract

At present, the research based on classification network always has the problem of single data type or fixed model structure, which will affect the processing ability of classification network in the face of practical application. Based on the above problems, this paper designs and implements a new DXVNet multimodal classification and recognition network. First of all, the original traditional two-way GRU structure is reconstructed into a new kind of asymmetric double D - GRU helped structure, was in the past and the future direction of two simple serial output information on building into the output of the information can be used to transfer from the past, have solved the problem of insufficient text characteristic inner link, and can better accumulation of useful information in the past. Secondly, this paper uses Xception pre-training model based on ImageNet and VGG16 pre-training model based on ImageNet to extract two different levels of image features from the same image dataset. Then the image features of two different levels are connected with the text features extracted by D-GRU for multi-modal feature connection. Finally, the fused features are input into the fully connected layer and classified by Softmax, thus forming a new DXVNet multimodal classification network. The experimental results show that the index of DXVNet model is better than other baseline unimodal models and most multimodal models. In the two data sets, the accuracy was improved by about 3.2%–7.63% and 2.94%–7.07% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 32.09
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
EUR 42.79
Price includes VAT (Germany)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Liu, J., Zhang, P., Liu, Y., et al.: Review of multimodal emotion analysis techniques. J. Comput. Sci. Explor. 15(7), 1165–1182 (2021). (in Chinese)

    Google Scholar 

  2. Lin, M., Meng, Z.: Multimodal sentiment analysis based on attention neural network. Comput. Sci. 47(11A), 508–514 (2020). (in Chinese)

    Google Scholar 

  3. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)

    Article  Google Scholar 

  4. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)

    Google Scholar 

  5. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. ar**v preprint ar**v:1409.1556 (2014)

  6. Loper, E., Bird, S.: Nltk: the natural language toolkit. ar**v preprint cs/0205028 (2002)

    Google Scholar 

  7. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  8. He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  9. Zhang, Y., Wallace, B.: A sensitivity analysis of (and practitioners guide to) convolutional neural networks for sentence classification. ar**v preprint ar**v:1510.03820 (2015)

  10. Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. ar**v preprint ar**v:1605.05101 (2016)

  11. Gallo, I., Ria, G., Landro, N., et al.: Image and text fusion for UPMC Food-101 using BERT and CNNs. In: 2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ). IEEE, pp. 1–6 (2020)

    Google Scholar 

  12. Hu, A., Flaxman, S.: Multimodal sentiment analysis to explore the structure of emotions. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 350–358 (2018)

    Google Scholar 

  13. Gallo, I., Calefati, A., Nawaz, S., et al.: Image and encoded text fusion for multi-modal classification. In: 2018 Digital Image Computing: Techniques and Applications (DICTA), pp. 1–7. IEEE (2018)

    Google Scholar 

  14. Sharma, M., Kandasamy, I., Vasantha, W.: Memebusters at SemEval-2020 task 8: feature fusion model for sentiment analysis on memes using transfer learning. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 1163–1171 (2020)

    Google Scholar 

Download references

Acknowledgments

This research is supported by the Erasmus+ Capacity Building in Higher Education “Building Skills4.0 Through University and Enterprise Collaboration”(598649-EPP-1-2018-1-FR-EPPKA2- CBHE-JP), and the National Natural Science Foundation of China (NSFC) (No. 61602064), and Science and Technology Agency Project of Sichuan Province (Nos. 2021YFH0107), and European Union funded project (598649-EPP-1-2018-1-FR-EPPKA2-CBHE -JP).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haiqing Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, H., Li, D., Zhang, H., Yu, X., Tang, D., He, L. (2023). DXVNet Multimodal Classification Recognition Network Based on Dissymmetric Bidirectional D-GRU. In: Qian, Z., Jabbar, M., Cheung, S.K.S., Li, X. (eds) Proceeding of 2022 International Conference on Wireless Communications, Networking and Applications (WCNA 2022). WCNA 2022. Lecture Notes in Electrical Engineering, vol 1059. Springer, Singapore. https://doi.org/10.1007/978-981-99-3951-0_28

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-3951-0_28

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-3950-3

  • Online ISBN: 978-981-99-3951-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Navigation