Abstract
At present, the research based on classification network always has the problem of single data type or fixed model structure, which will affect the processing ability of classification network in the face of practical application. Based on the above problems, this paper designs and implements a new DXVNet multimodal classification and recognition network. First of all, the original traditional two-way GRU structure is reconstructed into a new kind of asymmetric double D - GRU helped structure, was in the past and the future direction of two simple serial output information on building into the output of the information can be used to transfer from the past, have solved the problem of insufficient text characteristic inner link, and can better accumulation of useful information in the past. Secondly, this paper uses Xception pre-training model based on ImageNet and VGG16 pre-training model based on ImageNet to extract two different levels of image features from the same image dataset. Then the image features of two different levels are connected with the text features extracted by D-GRU for multi-modal feature connection. Finally, the fused features are input into the fully connected layer and classified by Softmax, thus forming a new DXVNet multimodal classification network. The experimental results show that the index of DXVNet model is better than other baseline unimodal models and most multimodal models. In the two data sets, the accuracy was improved by about 3.2%–7.63% and 2.94%–7.07% respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Liu, J., Zhang, P., Liu, Y., et al.: Review of multimodal emotion analysis techniques. J. Comput. Sci. Explor. 15(7), 1165–1182 (2021). (in Chinese)
Lin, M., Meng, Z.: Multimodal sentiment analysis based on attention neural network. Comput. Sci. 47(11A), 508–514 (2020). (in Chinese)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. ar**v preprint ar**v:1409.1556 (2014)
Loper, E., Bird, S.: Nltk: the natural language toolkit. ar**v preprint cs/0205028 (2002)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Zhang, Y., Wallace, B.: A sensitivity analysis of (and practitioners guide to) convolutional neural networks for sentence classification. ar**v preprint ar**v:1510.03820 (2015)
Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. ar**v preprint ar**v:1605.05101 (2016)
Gallo, I., Ria, G., Landro, N., et al.: Image and text fusion for UPMC Food-101 using BERT and CNNs. In: 2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ). IEEE, pp. 1–6 (2020)
Hu, A., Flaxman, S.: Multimodal sentiment analysis to explore the structure of emotions. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 350–358 (2018)
Gallo, I., Calefati, A., Nawaz, S., et al.: Image and encoded text fusion for multi-modal classification. In: 2018 Digital Image Computing: Techniques and Applications (DICTA), pp. 1–7. IEEE (2018)
Sharma, M., Kandasamy, I., Vasantha, W.: Memebusters at SemEval-2020 task 8: feature fusion model for sentiment analysis on memes using transfer learning. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 1163–1171 (2020)
Acknowledgments
This research is supported by the Erasmus+ Capacity Building in Higher Education “Building Skills4.0 Through University and Enterprise Collaboration”(598649-EPP-1-2018-1-FR-EPPKA2- CBHE-JP), and the National Natural Science Foundation of China (NSFC) (No. 61602064), and Science and Technology Agency Project of Sichuan Province (Nos. 2021YFH0107), and European Union funded project (598649-EPP-1-2018-1-FR-EPPKA2-CBHE -JP).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, H., Li, D., Zhang, H., Yu, X., Tang, D., He, L. (2023). DXVNet Multimodal Classification Recognition Network Based on Dissymmetric Bidirectional D-GRU. In: Qian, Z., Jabbar, M., Cheung, S.K.S., Li, X. (eds) Proceeding of 2022 International Conference on Wireless Communications, Networking and Applications (WCNA 2022). WCNA 2022. Lecture Notes in Electrical Engineering, vol 1059. Springer, Singapore. https://doi.org/10.1007/978-981-99-3951-0_28
Download citation
DOI: https://doi.org/10.1007/978-981-99-3951-0_28
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-3950-3
Online ISBN: 978-981-99-3951-0
eBook Packages: EngineeringEngineering (R0)