Log in

Sentiment analysis via semi-supervised learning: a model based on dynamic threshold and multi-classifiers

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Sentiment analysis has become a very popular research topic, especially for retrieving valuable information from various online environments. Most existing sentiment studies are based on supervised learning, which requires sufficient amount of labeled data. However, sentiment analysis often faces insufficient labeled data in practice, as it is very expensive and time-consuming to label large amount of data. To handle the scenario of insufficient initial labeled data, we propose a novel semi-supervised model based on dynamic threshold and multi-classifiers. In particular, the training data are auto-labeled in an iterative way based on the proposed dynamic threshold algorithm, where a dynamic threshold function is proposed to set thresholds for selecting the auto-labeled data. It considers both the quality and quantity of the auto-labeled data. In addition, the proposed weighted voting strategy combines multiple support vector machine classifiers by considering performance gap among different classifiers. The performance of the proposed model is validated through experiments on real datasets. Compared with two other existing models, the proposed model achieves the highest sentiment analysis accuracy across datasets with different sizes of initial labeled training data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. The dataset can be downloaded from the Web site: http://ai.stanford.edu/ amaas/data/ sentiment/.

References

  1. Nagarajan SM, Gandhi UD (2018) Classifying streaming of twitter data based on sentiment analysis using hybridization. Neural Comput Appl 4:1–9

    Google Scholar 

  2. Valdivia A, Luzn MV, Herrera F (2017) Sentiment analysis in TripAdvisor. IEEE Intell Syst 32(4):72–77

    Article  Google Scholar 

  3. Lei X, Qian X, Zhao G (2016) Rating prediction based on social sentiment from textual reviews. IEEE Trans Multimed 18(9):1910–1921

    Article  Google Scholar 

  4. Cao J, Zeng K, Wang H (2014) Web-based traffic sentiment analysis: methods and applications. IEEE Trans Intell Transp Syst 15(2):844–853

    Article  Google Scholar 

  5. Lu Y, Rao Y, Yang J, Yin J (2018) Incorporating Lexicons into LSTM for sentiment classification. In: 2018 International joint conference on neural networks (IJCNN), pp 1–7

  6. Chen Y, Zhang Z (2018) Research on text sentiment analysis based on CNNs and SVM. In: 13th IEEE conference on industrial electronics and applications (ICIEA), pp 2731–2734

  7. Yenter A, Verma A (2017) Deep CNN-LSTM with combined kernels from multiple branches for IMDB review sentiment analysis. In: IEEE 8th annual ubiquitous computing, electronics and mobile communication conference (UEMCON), pp 540–546

  8. Zhou S, Chen Q, Wang X (2013) Active deep learning method for semisupervised sentiment classification. Neurocomputing 120(10):536–546

    Article  Google Scholar 

  9. Hussain A, Cambria E (2018) Semi-supervised learning for big social data analysis. Neurocomputing 275:1662–1673

    Article  Google Scholar 

  10. Rout J, Dalmia A, Choo KKR, Bakshi S, Jena S (2017) Revisiting semisupervised learning for online deceptive review detection. IEEE Access 99:1–1

    Google Scholar 

  11. Fung G, Mangasarian OL (2001) Semi-supervised support vector machines for unlabeled data classification. Optim Methods Softw 15(1):29–44

    Article  Google Scholar 

  12. Zhang H, Liu G, Chow TWS (2011) Textual and visual content-based anti-phishing: a Bayesian approach. IEEE Trans Neural Netw 22(10):1532–1546

    Article  Google Scholar 

  13. Hong S, Lee J, Lee JH (2014) Competitive self-training technique for sentiment analysis in mass social media. In: International symposium on soft computing and intelligent systems, pp 9–12

  14. Huang W, Fan L (2016) Semi-supervised sentiment classification based on ensemble learning with voting. J Chin Inf Process 2:41–49

    Google Scholar 

  15. Nasukawa T, Yi J (2003) Sentiment analysis: capturing favorability using natural language processing. In: International conference on knowledge capture, pp 70–77

  16. Atarashi K, Oyama S, Kurihara M (2018) Semi-supervised learning from crowds using deep generative models. In: Proceedings of the thirty-second AAAI conference on artificial intelligence (AAAI-18). AAAI

  17. Blum A (1998) Combining labeled and unlabeled data with co-training. In: Conference on computational learning theory, pp 92–100

  18. Maeireizo B, Litman D, Hwa R (2004) Co-training for predicting emotions with spoken dialogue data. In: ACL 2004 on interactive poster and demonstration sessions, p 28

  19. Li T, Zhang Y, Sindhwani V (2009) A non-negative matrix tri-factorization approach to sentiment classification with lexical prior knowledge. In: Joint conference of the meeting of the ACL and the international joint conference on natural language processing of the AFNLP, vol, pp 244–252

  20. Sindhwani V, Melville P (2008) Document-word co-regularization for semisupervised sentiment analysis, pp 1025–1030

  21. He Y, Zhou D (2011) Self-training from labeled features for sentiment analysis. Inf Process Manag 47(4):606–616

    Article  Google Scholar 

  22. Nora BM, Lemnaru C, Potolea R (2010) Semi-supervised learning with lexical knowledge for opinion mining. IEEE Computer Society

  23. Lu TJ (2015) Semi-supervised microblog sentiment analysis using social relation and text similarity. In: International conference on big data and smart computing, pp 194–201

  24. Sadhana SA, Sairamesh L, Sabena S, Ganapathy S, Kannan A (2017) Mining target opinions from online reviews using semi-supervised word alignment model. In: Second international conference on recent trends and challenges in computational models, pp 196–200

  25. Hajmohammadi MS, Ibrahim R, Selamat A (2015) Graph-based semisupervised learning for cross-lingual sentiment classification. Springer, Berlin

    Google Scholar 

  26. Zhu S, Xu B, Zheng D, Zhao T (2013) Chinese microblog sentiment analysis based on semi-supervised learning. Springer, New York

    Book  Google Scholar 

  27. Aghababaei S, Makrehchi M (2017) Interpolative self-training approach for sentiment analysis. In: International conference on behavioral, economic and socio-cultural computing, pp 1–6

  28. Shi H, Li X, Liu H, Zhu L (2016) Research on the attribute classification of sentiment target based on the stratified sampling. In: International conference on natural computation, fuzzy systems and knowledge discovery, pp 1180–1187

  29. Dai L, Chen H, Li X (2011) Improving sentiment classification using feature highlighting and feature bagging. In: IEEE international conference on data mining workshops, pp 61–66

  30. Rong W, Nie Y, Ouyang Y, Peng B, **ong Z (2014) Auto-encoder based bagging architecture for sentiment analysis. J Vis Lang Comput 25(6):840–849

    Article  Google Scholar 

  31. Prusa J, Khoshgoftaar TM, Dittman DJ (2015) Using ensemble learners to improve classifier performance on tweet sentiment data. In: IEEE international conference on information reuse and integration, pp 252–257

Download references

Acknowledgements

This work was supported by National Nature Science Foundation of China (NSFC) under Project 71502125.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhigang **.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, Y., Liu, Y. & **, Z. Sentiment analysis via semi-supervised learning: a model based on dynamic threshold and multi-classifiers. Neural Comput & Applic 32, 5117–5129 (2020). https://doi.org/10.1007/s00521-018-3958-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-018-3958-3

Keywords

Navigation