A Comparative Study of Classical and Deep Classifiers for Textual Addressee Detection in Human-Human-Machine Conversations

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11658))

Included in the following conference series:

  • 1187 Accesses

Abstract

The problem of addressee detection (AD) arises in multiparty conversations involving several dialogue agents. In order to maintain such conversations in a realistic manner, an automatic spoken dialogue system is supposed to distinguish between computer- and human-directed utterances since the latter utterances either need to be processed in a specific way or should be completely ignored by the system. In the present paper, we consider AD to be a text classification problem and model three aspects of users’ speech (syntactical, lexical, and semantical) that are relevant to AD in German. We compare simple classifiers operating with supervised text representations learned from in-domain data and more advanced neural network-based models operating with unsupervised text representations learned from in- and out-of-domain data. The latter models provide a small yet significant AD performance improvement over the classical ones on the Smart Video Corpus. A neural network-based semantical model determines the context of the first four words of an utterance to be the most informative for AD, significantly surpasses syntactical and lexical text classifiers and keeps up with a baseline multimodal metaclassifier that utilises acoustical information in addition to textual data. We also propose an effective approach to building representations for out-of-vocabulary words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 42.79
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 53.49
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Akhtiamov, O., Palkov, V.: Gaze, prosody and semantics: relevance of various multimodal signals to addressee detection in human-human-computer conversations. In: Karpov, A., Jokisch, O., Potapova, R. (eds.) SPECOM 2018. LNCS (LNAI), vol. 11096, pp. 1–10. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99579-3_1

    Chapter  Google Scholar 

  2. Akhtiamov, O., Sergienko, R., Minker, W.: An approach to off-talk detection based on text classification within an automatic spoken dialogue system. In: 13th ICINCO, pp. 288–293 (2016)

    Google Scholar 

  3. Akhtiamov, O., Sidorov, M., Karpov, A., Minker, W.: Speech and text analysis for multimodal addressee detection in human-human-computer interaction. In: Interspeech, pp. 2521–2525. ISCA (2017)

    Google Scholar 

  4. Akhtiamov, O., Ubskii, D., Feldina, E., Pugachev, A., Karpov, A., Minker, W.: Are you addressing me? Multimodal addressee detection in human-human-computer conversations. In: Karpov, A., Potapova, R., Mporas, I. (eds.) SPECOM 2017. LNCS (LNAI), vol. 10458, pp. 152–161. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66429-3_14

    Chapter  Google Scholar 

  5. Batliner, A., Hacker, C., Nöth, E.: To talk or not to talk with a computer. J. Multimodal User Interfaces 2(3), 171–186 (2008)

    Article  Google Scholar 

  6. Brants, S., et al.: TIGER: linguistic interpretation of a german corpus. Res. Lang. Comput. 2(4), 597–620 (2004)

    Article  Google Scholar 

  7. Chollet, F., et al.: Keras (2015). https://github.com/fchollet/keras

  8. Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Sirmakessis, S. (ed.) Text Mining and Its Applications, pp. 81–97. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-45219-5_7

    Chapter  Google Scholar 

  9. Gasanova, T., Sergienko, R., Akhmedova, S., Semenkin, E., Minker, W.: Opinion mining and topic categorization with novel term weighting. In: 5th WASSA, pp. 84–89. ACL (2014)

    Google Scholar 

  10. Hofmann, M., Klinkenberg, R.: RapidMiner: Data Mining Use Cases and Business Analytics Applications. CRC Press, Boca Raton (2013)

    Google Scholar 

  11. Honnibal, M., Montani, I.: spaCy (2017). https://github.com/explosion/spaCy

  12. Huang, P.S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: 22nd CIKM, pp. 2333–2338. ACM (2013)

    Google Scholar 

  13. Ko, Y.: A study of term weighting schemes using class information for text classification. In: 35th SIGIR, pp. 1029–1030. ACM (2012)

    Google Scholar 

  14. Lan, M., Tan, C.L., Su, J., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE TPAMI 31(4), 721–735 (2009)

    Article  Google Scholar 

  15. Lee, H., Stolcke, A., Shriberg, E.: Using out-of-domain data for lexical addressee detection in human-human-computer dialog. In: NAACL HLT, pp. 221–229 (2013)

    Google Scholar 

  16. Nothman, J., Ringland, N., Radford, W., Murphy, T., Curran, J.R.: Learning multilingual named entity recognition from Wikipedia. Artif. Intel. 194, 151–175 (2013)

    Article  MathSciNet  Google Scholar 

  17. Paek, T., Horvitz, E., Ringger, E.: Continuous listening for unconstrained spoken dialog. In: 6th ICSLP (2000)

    Google Scholar 

  18. Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)

    Google Scholar 

  19. Ravuri, S., Stolcke, A.: Recurrent neural network and LSTM models for lexical utterance classification. In: Interspeech. ISCA (2015)

    Google Scholar 

  20. Ravuri, S.V., Stolcke, A.: Neural network models for lexical addressee detection. In: Interspeech. ISCA (2014)

    Google Scholar 

  21. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)

    Article  Google Scholar 

  22. Siegert, I., KrĂĽger, J.: How do we speak with Alexa - subjective and objective assessments of changes in speaking style between HC and HH conversations. Kognitive Systeme 1 (2019). https://doi.org/10.17185/duepublico/48596

  23. Soucy, P., Mineau, G.W.: Beyond TFIDF weighting for text categorization in the vector space model. IJCAI 5, 1130–1135 (2005)

    Google Scholar 

  24. Tsai, T., Stolcke, A., Slaney, M.: A study of multimodal addressee detection in human-human-computer interaction. IEEE Trans. Multimedia 17(9), 1550–1561 (2015)

    Article  Google Scholar 

  25. Xu, H., Li, C.: A novel term weighting scheme for automated text categorization. In: 7th ISDA, pp. 759–764. IEEE (2007)

    Google Scholar 

Download references

Acknowledgements

This research is partially financially supported by DAAD jointly with the Ministry of Education and Science of Russia within the Michail Lomonosov Program (project No. 2.12779.2018/12.2), by RFBR (project No. 18-07-01407) and by the Government of Russia (grant No. 08-08).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oleg Akhtiamov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Akhtiamov, O., Fedotov, D., Minker, W. (2019). A Comparative Study of Classical and Deep Classifiers for Textual Addressee Detection in Human-Human-Machine Conversations. In: Salah, A., Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2019. Lecture Notes in Computer Science(), vol 11658. Springer, Cham. https://doi.org/10.1007/978-3-030-26061-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26061-3_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26060-6

  • Online ISBN: 978-3-030-26061-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation