A Comparative Study of Classical and Deep Classifiers for Textual Addressee Detection in Human-Human-Machine Conversations

Akhtiamov, Oleg; Fedotov, Dmitrii; Minker, Wolfgang

doi:10.1007/978-3-030-26061-3_3

Oleg Akhtiamov^11,12,
Dmitrii Fedotov^11,12 &
Wolfgang Minker¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11658))

Included in the following conference series:

International Conference on Speech and Computer

1187 Accesses

Abstract

The problem of addressee detection (AD) arises in multiparty conversations involving several dialogue agents. In order to maintain such conversations in a realistic manner, an automatic spoken dialogue system is supposed to distinguish between computer- and human-directed utterances since the latter utterances either need to be processed in a specific way or should be completely ignored by the system. In the present paper, we consider AD to be a text classification problem and model three aspects of users’ speech (syntactical, lexical, and semantical) that are relevant to AD in German. We compare simple classifiers operating with supervised text representations learned from in-domain data and more advanced neural network-based models operating with unsupervised text representations learned from in- and out-of-domain data. The latter models provide a small yet significant AD performance improvement over the classical ones on the Smart Video Corpus. A neural network-based semantical model determines the context of the first four words of an utterance to be the most informative for AD, significantly surpasses syntactical and lexical text classifiers and keeps up with a baseline multimodal metaclassifier that utilises acoustical information in addition to textual data. We also propose an effective approach to building representations for out-of-vocabulary words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 42.79; Price includes VAT (Germany)

Softcover Book: EUR 53.49; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Are You Addressing Me? Multimodal Addressee Detection in Human-Human-Computer Conversations

Deep Learning for Acoustic Addressee Detection in Spoken Dialogue Systems

Gaze, Prosody and Semantics: Relevance of Various Multimodal Signals to Addressee Detection in Human-Human-Computer Conversations

References

Akhtiamov, O., Palkov, V.: Gaze, prosody and semantics: relevance of various multimodal signals to addressee detection in human-human-computer conversations. In: Karpov, A., Jokisch, O., Potapova, R. (eds.) SPECOM 2018. LNCS (LNAI), vol. 11096, pp. 1–10. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99579-3_1
Chapter Google Scholar
Akhtiamov, O., Sergienko, R., Minker, W.: An approach to off-talk detection based on text classification within an automatic spoken dialogue system. In: 13th ICINCO, pp. 288–293 (2016)
Google Scholar
Akhtiamov, O., Sidorov, M., Karpov, A., Minker, W.: Speech and text analysis for multimodal addressee detection in human-human-computer interaction. In: Interspeech, pp. 2521–2525. ISCA (2017)
Google Scholar
Akhtiamov, O., Ubskii, D., Feldina, E., Pugachev, A., Karpov, A., Minker, W.: Are you addressing me? Multimodal addressee detection in human-human-computer conversations. In: Karpov, A., Potapova, R., Mporas, I. (eds.) SPECOM 2017. LNCS (LNAI), vol. 10458, pp. 152–161. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66429-3_14
Chapter Google Scholar
Batliner, A., Hacker, C., Nöth, E.: To talk or not to talk with a computer. J. Multimodal User Interfaces 2(3), 171–186 (2008)
Article Google Scholar
Brants, S., et al.: TIGER: linguistic interpretation of a german corpus. Res. Lang. Comput. 2(4), 597–620 (2004)
Article Google Scholar
Chollet, F., et al.: Keras (2015). https://github.com/fchollet/keras
Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Sirmakessis, S. (ed.) Text Mining and Its Applications, pp. 81–97. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-45219-5_7
Chapter Google Scholar
Gasanova, T., Sergienko, R., Akhmedova, S., Semenkin, E., Minker, W.: Opinion mining and topic categorization with novel term weighting. In: 5th WASSA, pp. 84–89. ACL (2014)
Google Scholar
Hofmann, M., Klinkenberg, R.: RapidMiner: Data Mining Use Cases and Business Analytics Applications. CRC Press, Boca Raton (2013)
Google Scholar
Honnibal, M., Montani, I.: spaCy (2017). https://github.com/explosion/spaCy
Huang, P.S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: 22nd CIKM, pp. 2333–2338. ACM (2013)
Google Scholar
Ko, Y.: A study of term weighting schemes using class information for text classification. In: 35th SIGIR, pp. 1029–1030. ACM (2012)
Google Scholar
Lan, M., Tan, C.L., Su, J., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE TPAMI 31(4), 721–735 (2009)
Article Google Scholar
Lee, H., Stolcke, A., Shriberg, E.: Using out-of-domain data for lexical addressee detection in human-human-computer dialog. In: NAACL HLT, pp. 221–229 (2013)
Google Scholar
Nothman, J., Ringland, N., Radford, W., Murphy, T., Curran, J.R.: Learning multilingual named entity recognition from Wikipedia. Artif. Intel. 194, 151–175 (2013)
Article MathSciNet Google Scholar
Paek, T., Horvitz, E., Ringger, E.: Continuous listening for unconstrained spoken dialog. In: 6th ICSLP (2000)
Google Scholar
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
Google Scholar
Ravuri, S., Stolcke, A.: Recurrent neural network and LSTM models for lexical utterance classification. In: Interspeech. ISCA (2015)
Google Scholar
Ravuri, S.V., Stolcke, A.: Neural network models for lexical addressee detection. In: Interspeech. ISCA (2014)
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)
Article Google Scholar
Siegert, I., Krüger, J.: How do we speak with Alexa - subjective and objective assessments of changes in speaking style between HC and HH conversations. Kognitive Systeme 1 (2019). https://doi.org/10.17185/duepublico/48596
Soucy, P., Mineau, G.W.: Beyond TFIDF weighting for text categorization in the vector space model. IJCAI 5, 1130–1135 (2005)
Google Scholar
Tsai, T., Stolcke, A., Slaney, M.: A study of multimodal addressee detection in human-human-computer interaction. IEEE Trans. Multimedia 17(9), 1550–1561 (2015)
Article Google Scholar
Xu, H., Li, C.: A novel term weighting scheme for automated text categorization. In: 7th ISDA, pp. 759–764. IEEE (2007)
Google Scholar

Download references

Acknowledgements

This research is partially financially supported by DAAD jointly with the Ministry of Education and Science of Russia within the Michail Lomonosov Program (project No. 2.12779.2018/12.2), by RFBR (project No. 18-07-01407) and by the Government of Russia (grant No. 08-08).

Author information

Authors and Affiliations

Institute of Communications Engineering, Ulm University, Ulm, Germany
Oleg Akhtiamov, Dmitrii Fedotov & Wolfgang Minker
ITMO University, St. Petersburg, Russia
Oleg Akhtiamov & Dmitrii Fedotov

Authors

Oleg Akhtiamov
View author publications
You can also search for this author in PubMed Google Scholar
Dmitrii Fedotov
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Minker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oleg Akhtiamov .

Editor information

Editors and Affiliations

Utrecht University, Utrecht, The Netherlands
Albert Ali Salah
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Akhtiamov, O., Fedotov, D., Minker, W. (2019). A Comparative Study of Classical and Deep Classifiers for Textual Addressee Detection in Human-Human-Machine Conversations. In: Salah, A., Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2019. Lecture Notes in Computer Science(), vol 11658. Springer, Cham. https://doi.org/10.1007/978-3-030-26061-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-26061-3_3
Published: 24 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26060-6
Online ISBN: 978-3-030-26061-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Comparative Study of Classical and Deep Classifiers for Textual Addressee Detection in Human-Human-Machine Conversations

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Are You Addressing Me? Multimodal Addressee Detection in Human-Human-Computer Conversations

Deep Learning for Acoustic Addressee Detection in Spoken Dialogue Systems

Gaze, Prosody and Semantics: Relevance of Various Multimodal Signals to Addressee Detection in Human-Human-Computer Conversations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Comparative Study of Classical and Deep Classifiers for Textual Addressee Detection in Human-Human-Machine Conversations

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Are You Addressing Me? Multimodal Addressee Detection in Human-Human-Computer Conversations

Deep Learning for Acoustic Addressee Detection in Spoken Dialogue Systems

Gaze, Prosody and Semantics: Relevance of Various Multimodal Signals to Addressee Detection in Human-Human-Computer Conversations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation