Abstract
Chinese sentiment analysis (CSA) has always been one of the challenges in natural language processing due to its complexity and uncertainty. Transformer has been successfully utilized in the understanding of semantics. However, it captures the sequence features in the text through position encoding, which is naturally insufficient compared with the recurrent model. To address this problem, we propose T-E-GRU. T-E-GRU combines the powerful global feature extraction of Transformer encoder and the natural sequence feature extraction of GRU for CSA. The experimental evaluations are conducted on three real Chinese datasets, the experimental results show that T-E-GRU has unique advantages over recurrent model, recurrent model with attention and BERT-based model.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11063-022-10966-8/MediaObjects/11063_2022_10966_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11063-022-10966-8/MediaObjects/11063_2022_10966_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11063-022-10966-8/MediaObjects/11063_2022_10966_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11063-022-10966-8/MediaObjects/11063_2022_10966_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11063-022-10966-8/MediaObjects/11063_2022_10966_Fig5_HTML.png)
Similar content being viewed by others
References
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. ar**v preprint ar**v:1409.0473
Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Networks 5(2):157–166
Bin L, Quan L, ** X, Qian Z, Peng Z (2017) Aspect-based sentiment analysis based on multi-attention CNN. J Comput Res Dev 54(8):1724
Cavnar WB, Trenkle JM, et al. (1994) N-gram-based text categorization. In: Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, 161175. Citeseer
Chen P, Sun Z, Bing L, Yang W (2017) Recurrent attention network on memory for aspect sentiment analysis. In: Proceedings of The 2017 Conference on Empirical Methods in Natural Language Processing, pp 452–461
Chen X, Qiu X, Zhu C, Liu P, Huang XJ (2015) Long short-term memory neural networks for Chinese word segmentation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp 1197–1206
Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading. ar**v preprint ar**v:1601.06733
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. ar**v preprint ar**v:1406.1078
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. ar**v preprint ar**v:1810.04805
Ding X, Liu B, Yu PS (2008) A holistic lexicon-based approach to opinion mining. In: Proceedings of The 2008 International Conference on Web Search and Data Mining, pp. 231–240
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al. (2020) An image is worth 16x16 words: transformers for image recognition at scale. ar**v preprint ar**v:2010.11929
Elman JL (1990) Finding structure in time. Cogn Sci 14(2):179–211
Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82–89
Gao J (2021) Chinese sentiment classification model based on pre-trained BERT. In: 2021 2nd International Conference on Computers, Information Processing and Advanced Education, pp 1296–1300
Gu F, Askari A, El Ghaoui L (2020) Fenchel lifted networks: A lagrange relaxation of neural network training. In: International Conference on Artificial Intelligence and Statistics, pp 3362–3371. PMLR
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Hu D (2019) An introductory survey on attention mechanisms in NLP problems. In: Proceedings of SAI Intelligent Systems Conference, pp 432–448. Springer
Huang C, Zhao H (2007) Chinese word segmentation: A decade review. J Chin Inf Process 21(3):8–20
Jordan M (1986) Serial order: a parallel distributed processing approach. Technical report, June 1985-March 1986. Tech. rep., California Univ., San Diego, La Jolla (USA). Inst. for Cognitive Science
Lafferty J, McCallum A, Pereira FC (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data
Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) Albert: A lite BERT for self-supervised learning of language representations. ar**v preprint ar**v:1909.11942
Li S, Zhao Z, Hu R, Li W, Liu T, Du X (2018) Analogical reasoning on Chinese morphological and semantic relations. ar**v preprint ar**v:1805.06504
Li X, Meng Y, Sun X, Han Q, Yuan A, Li J (2019) Is word segmentation necessary for deep learning of Chinese representations? ar**v preprint ar**v:1905.05526
Liang J, Chai Y, Yuan H, Gao M, Zan H (2015) Polarity shifting and LSTM based recursive networks for sentiment analysis. J Chine Inf Process 29(5):152–159
Liu M, Chen L, Du X, ** L, Shang M (2021) Activated gradients for deep neural networks. IEEE Transactions on Neural Networks and Learning Systems
Liu S, Mocanu DC, Pei Y, Pechenizkiy M (2021) Selfish sparse RNN training. ar**v preprint ar**v:2101.09048
Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp 142–150. Association for Computational Linguistics, Portland, Oregon, USA. http://www.aclweb.org/anthology/P11-1015
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. ar**v preprint ar**v:1310.4546
Mikolov T, Yih Wt, Zweig G (2013) Linguistic regularities in continuous space word representations. In: Proceedings of The 2013 Conference of The North American Chapter of The Association for Computational Linguistics: Human Language Technologies, pp 746–751
Mnih V, Heess N, Graves A, Kavukcuoglu K (2014) Recurrent models of visual attention. ar**v preprint ar**v:1406.6247
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of The 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1532–1543
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. ar**v preprint ar**v:1802.05365
Rabiner L, Juang B (1986) An introduction to hidden Markov models. IEEE ASSP Mag 3(1):4–16
Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training
Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation. Tech. rep., California Univ San Diego La Jolla Inst for Cognitive Science
Shi X, Lu R (2019) Attention-based bidirectional hierarchical LSTM networks for text semantic classification. In: 2019 10th International Conference on Information Technology in Medicine and Education (ITME), pp 618–622. IEEE
Tang F, Nongpong K (2021) Chinese sentiment analysis based on lightweight character-level BERT. In: 2021 13th International Conference on Knowledge and Smart Technology (KST), pp 27–32. IEEE
Vashishth S, Upadhyay S, Tomar GS, Faruqui M (2019) Attention interpretability across NLP tasks. ar**v preprint ar**v:1909.11218
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. ar**v preprint ar**v:1706.03762
Wang S, Li J, Hu D (2021) BiGRU-multi-head self-attention network for Chinese sentiment classification. In: Journal of Physics: Conference Series, 1827: 012169. IOP Publishing
**ao Z, Liang P (2016) Chinese sentiment analysis using bidirectional LSTM with word embedding. In: International Conference on Cloud Computing and Security, pp 601–610. Springer
Yao Y, Huang Z (2016) Bi-directional LSTM recurrent neural network for Chinese word segmentation. In: International Conference on Neural Information Processing, pp 345–353. Springer
Yu Z, Yu J, Cui Y, Tao D, Tian Q (2019) Deep modular co-attention networks for visual question answering. In: Proceedings of The IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6281–6290
Zhang L, Wang S, Liu B (2018) Deep learning for sentiment analysis: A survey. Wiley Interdiscip Reviews Data Mining Knowl Discov 8(4):e1253
Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. Advances in Neural Information Processing Systems 28:649–657
Zhang Y, Zhang M, Liu Y, Ma S, Feng S (2013) Localized matrix factorization for recommendation based on matrix block diagonal forms. In: Proceedings of The 22nd International Conference on World Wide Web, pp 1511–1520
Zhang Z, Brand M (2017) Convergent block coordinate descent for training tikhonov regularized deep neural networks. Advances in Neural Information Processing Systems 30:1721–1730
Zhang Z, Yue Y, Wu G, Li Y, Zhang H (2021) SBO-RNN: reformulating recurrent neural networks via stochastic bilevel optimization. Adv Neural Inf Process Syst 34:25839–25851
Zhou H, Zhang S, Peng J, Zhang S, Li J, **ong H, Zhang W (2020) Informer: beyond efficient transformer for long sequence time-series forecasting. ar**v preprint ar**v:2012.07436
Zhou L, Zhou Y, Corso JJ, Socher R, **ong C (2018) End-to-end dense video captioning with masked transformer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8739–8748
Zhu Z, Zhou Y, Xu S (2019) Transformer based Chinese sentiment classification. In: Proceedings of the 2019 2nd International Conference on Computational Intelligence and Intelligent Systems, pp 51–56
Zou H, Tang X, **e B, Liu B (2015) Sentiment classification using machine learning techniques with syntax features. In: 2015 International Conference on Computational Science and Computational Intelligence (CSCI), pp 175–179. IEEE
Acknowledgements
This work is supported by the Key Research and Development Program of Shaanxi Province (No. 2020KW-068), the National Natural Science Foundation of China under Grant (No. 62106199, No. 62002290, No. 62001385), China Postdoctoral Science Foundation (No. 2021MD703883) and General Project of Education Department of Shaanxi Provincial Government under Grant (No. 21JK0926).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, B., Zhou, W. Transformer-Encoder-GRU (T-E-GRU) for Chinese Sentiment Analysis on Chinese Comment Text. Neural Process Lett 55, 1847–1867 (2023). https://doi.org/10.1007/s11063-022-10966-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-022-10966-8