Abstract
The problem of addressee detection (AD) arises in multiparty conversations involving several dialogue agents. In order to maintain such conversations in a realistic manner, an automatic spoken dialogue system is supposed to distinguish between computer- and human-directed utterances since the latter utterances either need to be processed in a specific way or should be completely ignored by the system. In the present paper, we consider AD to be a text classification problem and model three aspects of users’ speech (syntactical, lexical, and semantical) that are relevant to AD in German. We compare simple classifiers operating with supervised text representations learned from in-domain data and more advanced neural network-based models operating with unsupervised text representations learned from in- and out-of-domain data. The latter models provide a small yet significant AD performance improvement over the classical ones on the Smart Video Corpus. A neural network-based semantical model determines the context of the first four words of an utterance to be the most informative for AD, significantly surpasses syntactical and lexical text classifiers and keeps up with a baseline multimodal metaclassifier that utilises acoustical information in addition to textual data. We also propose an effective approach to building representations for out-of-vocabulary words.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Akhtiamov, O., Palkov, V.: Gaze, prosody and semantics: relevance of various multimodal signals to addressee detection in human-human-computer conversations. In: Karpov, A., Jokisch, O., Potapova, R. (eds.) SPECOM 2018. LNCS (LNAI), vol. 11096, pp. 1–10. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99579-3_1
Akhtiamov, O., Sergienko, R., Minker, W.: An approach to off-talk detection based on text classification within an automatic spoken dialogue system. In: 13th ICINCO, pp. 288–293 (2016)
Akhtiamov, O., Sidorov, M., Karpov, A., Minker, W.: Speech and text analysis for multimodal addressee detection in human-human-computer interaction. In: Interspeech, pp. 2521–2525. ISCA (2017)
Akhtiamov, O., Ubskii, D., Feldina, E., Pugachev, A., Karpov, A., Minker, W.: Are you addressing me? Multimodal addressee detection in human-human-computer conversations. In: Karpov, A., Potapova, R., Mporas, I. (eds.) SPECOM 2017. LNCS (LNAI), vol. 10458, pp. 152–161. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66429-3_14
Batliner, A., Hacker, C., Nöth, E.: To talk or not to talk with a computer. J. Multimodal User Interfaces 2(3), 171–186 (2008)
Brants, S., et al.: TIGER: linguistic interpretation of a german corpus. Res. Lang. Comput. 2(4), 597–620 (2004)
Chollet, F., et al.: Keras (2015). https://github.com/fchollet/keras
Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Sirmakessis, S. (ed.) Text Mining and Its Applications, pp. 81–97. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-45219-5_7
Gasanova, T., Sergienko, R., Akhmedova, S., Semenkin, E., Minker, W.: Opinion mining and topic categorization with novel term weighting. In: 5th WASSA, pp. 84–89. ACL (2014)
Hofmann, M., Klinkenberg, R.: RapidMiner: Data Mining Use Cases and Business Analytics Applications. CRC Press, Boca Raton (2013)
Honnibal, M., Montani, I.: spaCy (2017). https://github.com/explosion/spaCy
Huang, P.S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: 22nd CIKM, pp. 2333–2338. ACM (2013)
Ko, Y.: A study of term weighting schemes using class information for text classification. In: 35th SIGIR, pp. 1029–1030. ACM (2012)
Lan, M., Tan, C.L., Su, J., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE TPAMI 31(4), 721–735 (2009)
Lee, H., Stolcke, A., Shriberg, E.: Using out-of-domain data for lexical addressee detection in human-human-computer dialog. In: NAACL HLT, pp. 221–229 (2013)
Nothman, J., Ringland, N., Radford, W., Murphy, T., Curran, J.R.: Learning multilingual named entity recognition from Wikipedia. Artif. Intel. 194, 151–175 (2013)
Paek, T., Horvitz, E., Ringger, E.: Continuous listening for unconstrained spoken dialog. In: 6th ICSLP (2000)
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
Ravuri, S., Stolcke, A.: Recurrent neural network and LSTM models for lexical utterance classification. In: Interspeech. ISCA (2015)
Ravuri, S.V., Stolcke, A.: Neural network models for lexical addressee detection. In: Interspeech. ISCA (2014)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)
Siegert, I., KrĂĽger, J.: How do we speak with Alexa - subjective and objective assessments of changes in speaking style between HC and HH conversations. Kognitive Systeme 1 (2019). https://doi.org/10.17185/duepublico/48596
Soucy, P., Mineau, G.W.: Beyond TFIDF weighting for text categorization in the vector space model. IJCAI 5, 1130–1135 (2005)
Tsai, T., Stolcke, A., Slaney, M.: A study of multimodal addressee detection in human-human-computer interaction. IEEE Trans. Multimedia 17(9), 1550–1561 (2015)
Xu, H., Li, C.: A novel term weighting scheme for automated text categorization. In: 7th ISDA, pp. 759–764. IEEE (2007)
Acknowledgements
This research is partially financially supported by DAAD jointly with the Ministry of Education and Science of Russia within the Michail Lomonosov Program (project No. 2.12779.2018/12.2), by RFBR (project No. 18-07-01407) and by the Government of Russia (grant No. 08-08).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Akhtiamov, O., Fedotov, D., Minker, W. (2019). A Comparative Study of Classical and Deep Classifiers for Textual Addressee Detection in Human-Human-Machine Conversations. In: Salah, A., Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2019. Lecture Notes in Computer Science(), vol 11658. Springer, Cham. https://doi.org/10.1007/978-3-030-26061-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-26061-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26060-6
Online ISBN: 978-3-030-26061-3
eBook Packages: Computer ScienceComputer Science (R0)