• 336 Accesses

Abstract

Digital documents are generated, disseminated, and disclosed in books, research papers, newspapers, online feedback, and other content containing large amounts of information, for which discovering topics becomes important but challenging. In the current field of topic modeling, there are limited techniques available to deal with (1) the predominance of frequently occurring words in the estimated topics and (2) the overlap of commonly used words in different topics. We propose exclusive topic modeling (ETM) to identify field-specific keywords that typically occur less frequently but are important for representing certain topics, and to provide well-structured exclusive terms for topics within each topic. Specifically, we impose a weighted LASSO penalty to automatically reduce the dominance of frequently occurring but less relevant words and a pairwise Kullback–Leibler divergence penalty to achieve topic separation. Numerical studies show that the ETM can detect field-specific keywords and provide exclusive topics, which is more meaningful for interpretation and topic detection than other models such as the latent Dirichlet allocation model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (Canada)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Free ship** worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (Canada)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aletras, N. and Stevenson, M. (2013). Evaluating topic coherence using distributional semantics. In: Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013)–Long Papers 13–22.

    Google Scholar 

  2. Allen-Zhu, Z. and Li, Y. (2018). Neon2: Finding local minima via first-order oracles. In: Advances in Neural Information Processing Systems 3716–3726.

    Google Scholar 

  3. AlSumait, L., Barbará, D., Gentle, J. and Domeniconi, C. (2009). Topic significance ranking of LDA generative models. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer 67–82.

    Google Scholar 

  4. Arun, R., Suresh, V., Madhavan, C. V. and Murthy, M. N. (2010). On finding the natural number of topics with latent dirichlet allocation: Some observations. In: Pacific-Asia conference on knowledge discovery and data mining. Springer 391–402.

    Google Scholar 

  5. Bao, Y. and Datta, A. (2014). Simultaneously discovering and quantifying risk types from textual risk disclosures. Management Science 60 1371–1391.

    Google Scholar 

  6. Batmanghelich, K., Saeedi, A., Narasimhan, K. and Gershman, S. (2016). Nonparametric spherical topic modeling with word embeddings. In: Proceedings of the conference. Association for Computational Linguistics. Meeting. NIH Public Access, vol 2016p 537.

    Google Scholar 

  7. Bhattacharya, A., Pati, D., Pillai, N. S. and Dunson, D. B. (2015). Dirichlet–Laplace priors for optimal shrinkage. Journal of the American Statistical Association 110 1479–1490.

    MathSciNet  MATH  Google Scholar 

  8. Blei, D. and Lafferty, J. (2006a). Correlated topic models. Advances in Neural Information Processing Systems 18 147.

    Google Scholar 

  9. Blei, D. and Lafferty, J. (2006b). Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning. ACM 113–120.

    Google Scholar 

  10. Blei, D., Ng, A. and Jordan, M. (2003). Latent dirichlet allocation. Journal of Machine Learning Research 3 993–1022.

    MATH  Google Scholar 

  11. Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM 55 77–84.

    Google Scholar 

  12. Blei, D. M., Kucukelbir, A. and McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American statistical Association 112 859–877.

    MathSciNet  Google Scholar 

  13. Boyd, S., Boyd, S. P. and Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press.

    Google Scholar 

  14. Boyd-Graber, J. L. and Blei, D. M. (2009). Syntactic topic models. In: Advances in Neural Information Processing Systems.

    Google Scholar 

  15. Chen, X., Hu, X., Shen, X. and Rosen, G. (2010). Probabilistic topic modeling for genomic data interpretation. In: 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE 149–152.

    Google Scholar 

  16. Das, R., Zaheer, M. and Dyer, C. (2015). Gaussian lda for topic models with word embeddings. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) 795–804.

    Google Scholar 

  17. Fei-Fei, L. and Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. In: Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. IEEE, vol 2 524–531.

    Google Scholar 

  18. Griffiths, T. L., Jordan, M. I., Tenenbaum, J. B. and Blei, D. M. (2004). Hierarchical topic models and the nested chinese restaurant process. In: Advances in Neural Information Processing Systems 17–24.

    Google Scholar 

  19. Griffiths, T. L., Steyvers, M., Blei, D. M. and Tenenbaum, J. B. (2005). Integrating topics and syntax. In: Advances in Neural Information Processing Systems 537–544.

    Google Scholar 

  20. Gruber, A., Weiss, Y. and Rosen-Zvi, M. (2007). Hidden topic Markov models. In: Artificial intelligence and statistics 163–170.

    Google Scholar 

  21. Heaukulani, C. and Ghahramani, Z. (2013). Dynamic probabilistic models for latent feature propagation in social networks. In: International Conference on Machine Learning 275–283.

    Google Scholar 

  22. Huang, A. H., Lehavy, R., Zang, A. Y. and Zheng, R. (2018). Analyst information discovery and interpretation roles: A topic modeling approach. Management Science 64 2833–2855.

    Google Scholar 

  23. Korfiatis, N., Stamolampros, P., Kourouthanassis, P. and Sagiadinos, V. (2019). Measuring service quality from unstructured data: A topic modeling application on airline passengers’ online reviews. Expert Systems with Applications 116 472–486.

    Google Scholar 

  24. La Rosa, M., Fiannaca, A., Rizzo, R. and Urso, A. (2015). Probabilistic topic modeling for the analysis and classification of genomic sequences. BMC bioinformatics 16 1–9.

    MATH  Google Scholar 

  25. Lei, H., Chen, Y. and Chen, C. Y.-H. (2020). Investor attention and topic appearance probabilities: Evidence from treasury bond market. Available at SSRN 3646257.

    Google Scholar 

  26. Liu, L., Tang, L., Dong, W., Yao, S. and Zhou, W. (2016). An overview of topic modeling and its current applications in bioinformatics. SpringerPlus 5 1608.

    Google Scholar 

  27. Ma, B., Zhang, N., Liu, G., Li, L. and Yuan, H. (2016). Semantic search for public opinions on urban affairs: A probabilistic topic modeling-based approach. Information Processing & Management 52 430–445.

    Google Scholar 

  28. Miller, K., Jordan, M. and Griffiths, T. (2009). Nonparametric latent feature models for link prediction. Advances in neural information processing systems 22 1276–1284.

    Google Scholar 

  29. Mimno, D., Wallach, H. M., Talley, E., Leenders, M. and McCallum, A. (2011). Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics 262–272.

    Google Scholar 

  30. Nesterov, Y. and Polyak, B. T. (2006). Cubic regularization of Newton method and its global performance. Mathematical Programming 108 177–205.

    MathSciNet  MATH  Google Scholar 

  31. Newman, D., Lau, J. H., Grieser, K. and Baldwin, T. (2010). Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics 100–108.

    Google Scholar 

  32. Nguyen, T. H. and Shirai, K. (2015). Topic modeling based sentiment analysis on social media for stock market prediction. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) 1354–1364.

    Google Scholar 

  33. Pati, D., Bhattacharya, A. and Yang, Y. (2018). On statistical optimality of variational Bayes. In: International Conference on Artificial Intelligence and Statistics 1579–1588.

    Google Scholar 

  34. Rabinovich, M. and Blei, D. (2014). The inverse regression topic model. In: International Conference on Machine Learning 199–207.

    Google Scholar 

  35. Röder, M., Both, A. and Hinneburg, A. (2015). Exploring the space of topic coherence measures. In: Proceedings of the 8th ACM International Conference on Web Search and Data Mining 399–408.

    Google Scholar 

  36. Rosen-Zvi, M., Griffiths, T., Steyvers, M. and Smyth, P. (2012). The author-topic model for authors and documents. ar**v: 1207.4169.

  37. Sarkar, P. and Moore, A. W. (2005). Dynamic social network analysis using latent space models. ACM SIGKDD Explorations Newsletter 7 31–40.

    Google Scholar 

  38. Shi, B., Lam, W., Jameel, S., Schockaert, S. and Lai, K. P. (2017). Jointly learning word embeddings and latent topics. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval 375–384.

    Google Scholar 

  39. Soleimani, H. and Miller, D. J. (2014a). Parsimonious topic models with salient word discovery. IEEE Transactions on Knowledge and Data Engineering 27 824–837.

    Google Scholar 

  40. Soleimani, H. and Miller, D. J. (2014b). Sparse topic models by parameter sharing. In: 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP). IEEE 1–6.

    Google Scholar 

  41. Stevens, K., Kegelmeyer, P., Andrzejewski, D. and Buttler, D. (2012). Exploring topic coherence over many models and many topics. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning 952–961.

    Google Scholar 

  42. Sun, A., Lachanski, M. and Fabozzi, F. J. (2016). Trade the tweet: Social media text mining and sparse matrix factorization for stock market prediction. International Review of Financial Analysis 48 272–281.

    Google Scholar 

  43. Syed, S. and Spruit, M. (2017). Full-text or abstract? examining topic coherence scores using latent dirichlet allocation. In: 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE 165–174.

    Google Scholar 

  44. Vu, H. Q., Li, G. and Law, R. (2019). Discovering implicit activity preferences in travel itineraries by topic modeling. Tourism Management 75 435–446.

    Google Scholar 

  45. Wallach, H. M., Mimno, D. M. and McCallum, A. (2009). Rethinking LDA: Why priors matter. In: Advances in Neural Information Processing Systems 1973–1981.

    Google Scholar 

  46. Wang, Y. and Blei, D. M. (2019). Frequentist consistency of variational Bayes. Journal of the American Statistical Association 114 1147–1161.

    MathSciNet  MATH  Google Scholar 

  47. Xu, H., Wang, W., Liu, W. and Carin, L. (2018). Distilled Wasserstein learning for word embedding and topic modeling. In: Advances in Neural Information Processing Systems 1716–1725.

    Google Scholar 

  48. Yang, Y., Pati, D. and Bhattacharya, A. (2020). \(\alpha \)-variational inference with statistical guarantees. Annals of Statistics 48 886–905.

    MathSciNet  MATH  Google Scholar 

  49. Zhang, F., Gao, C. (2020). Convergence rates of variational posterior distributions. Annals of Statistics 48 2180–2207.

    MathSciNet  MATH  Google Scholar 

  50. Zhang, H., Kim, G. and **ng, E. P. (2015). Dynamic topic modeling for monitoring market competition from online text and image data. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1425–1434.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Lei, H., Liu, K., Chen, Y. (2023). Exclusive Topic Model. In: Liu, Y., Hirukawa, J., Kakizawa, Y. (eds) Research Papers in Statistical Inference for Time Series and Related Models. Springer, Singapore. https://doi.org/10.1007/978-981-99-0803-5_3

Download citation

Publish with us

Policies and ethics

Navigation