Abstract
In this work, we propose novel topic models to extract topics from multilingual documents. We add more flexibility to conventional LDA by relaxing some constraints in its prior. We apply other alternative priors namely generalized Dirichlet and Beta-Liouville distributions. Also, we extend finite mixture model to infinite case to provide flexibility in modelling various topics. To learn our proposed models, we deploy variational inference. To evaluate our framework, we tested it on English and French documents and compared topics and similarities by Jaccard index. The outcomes indicate that our proposed model could be considered as promising alternative in topic modeling.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bakhtiari, A.S., Bouguila, N.: A variational bayes model for count data learning and classification. Eng. Appl. Artif. Intell. 35, 176–186 (2014)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Fan, W., Bouguila, N.: Model-based clustering based on variational learning of hierarchical infinite beta-liouville mixture models. Neural Process. Lett. 44(2), 431–449 (2016)
Fan, W., Bouguila, N., Ziou, D.: Variational learning for finite dirichlet mixture models and applications. IEEE Trans. Neural Netw. Learn. Syst. 23(5), 762–774 (2012)
Fan, W., Sallay, H., Bouguila, N., Bourouis, S.: A hierarchical dirichlet process mixture of generalized dirichlet distributions for feature selection. Comput. Electr. Eng. 43, 48–65 (2015)
Gutiérrez, E.D., Shutova, E., Lichtenstein, P., de Melo, G., Gilardi, L.: Detecting cross-cultural differences using a multilingual topic model. Trans. Assoc. Comput. Linguist. 4, 47–60 (2016)
Ihou, K.E., Bouguila, N.: Stochastic topic models for large scale and nonstationary data. Eng. Appl. Artif. Intell. 88, 103364 (2020)
Liu, X., Duh, K., Matsumoto, Y.: Multilingual topic models for bilingual dictionary extraction. ACM Trans. Asian Low-Resour. Lang. Inf. Process. (TALLIP) 14(3), 1–22 (2015)
Liu, Y., Du, F., Sun, J., Jiang, Y.: ilda: An interactive latent dirichlet allocation model to improve topic quality. J. Inf. Sci. 46(1), 23–40 (2020)
Mimno, D., Wallach, H.M., Naradowsky, J., Smith, D.A., McCallum, A.: Polylingual topic models. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2, pp. 880–889 (2009)
Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 262–272. USA (2011)
Reber, U.: Overcoming language barriers: assessing the potential of machine translation and topic modeling for the comparative analysis of multilingual text corpora. Commun. Methods Measures 13(2), 102–125 (2019)
Yang, W., Boyd-Graber, J., Resnik, P.: A multilingual topic model for learning weighted topic links across corpora with low comparability. In: Proceedings of EMNLP-IJCNLP, pp. 1243–1248 (2019)
Yuan, M., Durme, B.V., Ying, J.L.: Multilingual anchoring: interactive topic modeling and alignment across languages. In: Annual Conference on Neural Information Processing Systems 2018, pp. 8667–8677 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Maanicshah, K., Manouchehri, N., Amayri, M., Bouguila, N. (2023). Novel Topic Models for Parallel Topics Extraction from Multilingual Text. In: Nguyen, N.T., et al. Intelligent Information and Database Systems. ACIIDS 2023. Lecture Notes in Computer Science(), vol 13996. Springer, Singapore. https://doi.org/10.1007/978-981-99-5837-5_25
Download citation
DOI: https://doi.org/10.1007/978-981-99-5837-5_25
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-5836-8
Online ISBN: 978-981-99-5837-5
eBook Packages: Computer ScienceComputer Science (R0)