DXVNet Multimodal Classification Recognition Network Based on Dissymmetric Bidirectional D-GRU

Li, Haoran; Li, Daiwei; Zhang, Haiqing; Yu, **; Tang, Dan; He, Lei

doi:10.1007/978-981-99-3951-0_28

Haoran Li⁴⁰,
Daiwei Li⁴⁰,
Haiqing Zhang⁴⁰,
** Yu⁴¹,
Dan Tang⁴⁰ &
…
Lei He⁴⁰

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 1059))

Included in the following conference series:

INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND APPLICATIONS

257 Accesses

Abstract

At present, the research based on classification network always has the problem of single data type or fixed model structure, which will affect the processing ability of classification network in the face of practical application. Based on the above problems, this paper designs and implements a new DXVNet multimodal classification and recognition network. First of all, the original traditional two-way GRU structure is reconstructed into a new kind of asymmetric double D - GRU helped structure, was in the past and the future direction of two simple serial output information on building into the output of the information can be used to transfer from the past, have solved the problem of insufficient text characteristic inner link, and can better accumulation of useful information in the past. Secondly, this paper uses Xception pre-training model based on ImageNet and VGG16 pre-training model based on ImageNet to extract two different levels of image features from the same image dataset. Then the image features of two different levels are connected with the text features extracted by D-GRU for multi-modal feature connection. Finally, the fused features are input into the fully connected layer and classified by Softmax, thus forming a new DXVNet multimodal classification network. The experimental results show that the index of DXVNet model is better than other baseline unimodal models and most multimodal models. In the two data sets, the accuracy was improved by about 3.2%–7.63% and 2.94%–7.07% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 32.09; Price includes VAT (Germany)

Hardcover Book: EUR 42.79; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Liu, J., Zhang, P., Liu, Y., et al.: Review of multimodal emotion analysis techniques. J. Comput. Sci. Explor. 15(7), 1165–1182 (2021). (in Chinese)
Google Scholar
Lin, M., Meng, Z.: Multimodal sentiment analysis based on attention neural network. Comput. Sci. 47(11A), 508–514 (2020). (in Chinese)
Google Scholar
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Article Google Scholar
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. ar**v preprint ar**v:1409.1556 (2014)
Loper, E., Bird, S.: Nltk: the natural language toolkit. ar**v preprint cs/0205028 (2002)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Zhang, Y., Wallace, B.: A sensitivity analysis of (and practitioners guide to) convolutional neural networks for sentence classification. ar**v preprint ar**v:1510.03820 (2015)
Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. ar**v preprint ar**v:1605.05101 (2016)
Gallo, I., Ria, G., Landro, N., et al.: Image and text fusion for UPMC Food-101 using BERT and CNNs. In: 2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ). IEEE, pp. 1–6 (2020)
Google Scholar
Hu, A., Flaxman, S.: Multimodal sentiment analysis to explore the structure of emotions. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 350–358 (2018)
Google Scholar
Gallo, I., Calefati, A., Nawaz, S., et al.: Image and encoded text fusion for multi-modal classification. In: 2018 Digital Image Computing: Techniques and Applications (DICTA), pp. 1–7. IEEE (2018)
Google Scholar
Sharma, M., Kandasamy, I., Vasantha, W.: Memebusters at SemEval-2020 task 8: feature fusion model for sentiment analysis on memes using transfer learning. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 1163–1171 (2020)
Google Scholar

Download references

Acknowledgments

This research is supported by the Erasmus+ Capacity Building in Higher Education “Building Skills4.0 Through University and Enterprise Collaboration”(598649-EPP-1-2018-1-FR-EPPKA2- CBHE-JP), and the National Natural Science Foundation of China (NSFC) (No. 61602064), and Science and Technology Agency Project of Sichuan Province (Nos. 2021YFH0107), and European Union funded project (598649-EPP-1-2018-1-FR-EPPKA2-CBHE -JP).

Author information

Authors and Affiliations

Chengdu University of Information Technology, Chengdu City, Sichuan Province, China
Haoran Li, Daiwei Li, Haiqing Zhang, Dan Tang & Lei He
Chengdu University, Chengdu City, Sichuan Province, China
** Yu

Authors

Haoran Li
View author publications
You can also search for this author in PubMed Google Scholar
Daiwei Li
View author publications
You can also search for this author in PubMed Google Scholar
Haiqing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
** Yu
View author publications
You can also search for this author in PubMed Google Scholar
Dan Tang
View author publications
You can also search for this author in PubMed Google Scholar
Lei He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haiqing Zhang .

Editor information

Editors and Affiliations

College of Communication Engineering, Jilin University, Jilin, China
Zhihong Qian
Vardhaman College of Engineering, Hyderabad, Telangana, India
M.A. Jabbar
Hong Kong Metropolitan University, Kowloon, Hong Kong
Simon K. S. Cheung
College of Technology, Indiana State University, Terre Haute, IN, USA
**aolong Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, H., Li, D., Zhang, H., Yu, X., Tang, D., He, L. (2023). DXVNet Multimodal Classification Recognition Network Based on Dissymmetric Bidirectional D-GRU. In: Qian, Z., Jabbar, M., Cheung, S.K.S., Li, X. (eds) Proceeding of 2022 International Conference on Wireless Communications, Networking and Applications (WCNA 2022). WCNA 2022. Lecture Notes in Electrical Engineering, vol 1059. Springer, Singapore. https://doi.org/10.1007/978-981-99-3951-0_28

Download citation

DOI: https://doi.org/10.1007/978-981-99-3951-0_28
Published: 27 July 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-3950-3
Online ISBN: 978-981-99-3951-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics