Abstract
Task-oriented dialogue system is commonly formulated as a reinforcement learning problem. A reward served as a learning objective is offered at the end of the generated dialogue to help optimize the system. As fulfilling a specific task often takes many turns between the system and the user, a scalar reward signal after this long process can be delayed and sparse. To address the above problems in the reinforcement learning (RL) based task-completion system, we propose a novel hierarchical attentive adversarial network HaGAN which features a cascaded attentive generator CAG that explores a state-action space to generate a dialogue and global-local attentive discriminators GLAD to give a relevant reward at multi-scale dialogue states. Specifically, after every turn of the dialogue generation, the turn-based discriminator tests the current turn and give a local reward representing the generator’s current generating ability. When the dialogue finishes, the dialogue-based discriminator gives a global reward concerns the whole dialog. Finally, a synthesized reward computed by combining global and local reward is returned to the generator. By doing so, the generator is able to generate globally and locally fluent and informative dialogues. Through experiments on two public benchmark datasets demonstrate the superiority of our HaGAN over other representative state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Mnih, V., et al.: Playing atari with deep reinforcement learning. ar**v preprint ar**v:1312.5602 (2013)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Young, S., Gašić, M., Thomson, B., Williams, J.D.: POMDP-based statistical spoken dialog systems: a review. Proc. IEEE 101(5), 1160–1179 (2013)
Fatemi, M., Asri, L.E., Schulz, H., He, J., Suleman, K.: Policy networks with two-stage training for dialogue systems. ar**v preprint ar**v:1606.03152 (2016)
Su, P.H., et al.: Continuously learning neural dialogue management. ar**v preprint ar**v:1606.02689 (2016)
Williams, J.D., Asadi, K., Zweig, G.: Hybrid code networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning. ar**v preprint ar**v:1702.03274 (2017)
Dhingra, B., et al.: Towards end-to-end reinforcement learning of dialogue agents for information access. ar**v preprint ar**v:1609.00777 (2016)
Lei, W., **, X., Kan, M.Y., Ren, Z., He, X., Yin, D.: Sequicity: simplifying task-oriented dialogue systems with single sequence-to-sequence architectures. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1437–1447 (2018)
Lipton, Z., Li, X., Gao, J., Li, L., Ahmed, F., Deng, L.: BBQ-networks: efficient exploration in deep reinforcement learning for task-oriented dialogue systems. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Su, P.H., et al.: On-line active reward learning for policy optimisation in spoken dialogue systems. ar**v preprint ar**v:1605.07669 (2016)
Liu, B., Lane, I.: Adversarial learning of task-oriented neural dialog models. ar**v preprint ar**v:1805.11762 (2018)
Peng, B., Li, X., Gao, J., Liu, J., Chen, Y.N., Wong, K.F.: Adversarial advantage actor-critic model for task-completion dialogue policy learning. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6149–6153. IEEE (2018)
Sutton, R.S., Barto, A.G., et al.: Introduction to Reinforcement Learning, vol. 135. MIT Press, Cambridge (1998)
Williams, J.D., Young, S.: Partially observable markov decision processes for spoken dialog systems. Comput. Speech Lang. 21(2), 393–422 (2007)
Silva, J., Coheur, L., Mendes, A.C., Wichert, A.: From symbolic to sub-symbolic information in question classification. Artif. Intell. Rev. 35(2), 137–154 (2011)
Liu, B., Lane, I.: Iterative policy learning in end-to-end trainable task-oriented neural dialog models. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 482–489. IEEE (2017)
Sharma, S., He, J., Suleman, K., Schulz, H., Bachman, P.: Natural language generation in dialogue using lexicalized and delexicalized data. ar**v preprint ar**v:1606.03632 (2016)
Wu, C.S., Madotto, A., Winata, G.I., Fung, P.: End-to-end dynamic query memory network for entity-value independent task-oriented dialog. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6154–6158. IEEE (2018)
Liu, F., Perez, J.: Gated end-to-end memory networks. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 1–10 (2017)
Li, J., Monroe, W., Shi, T., Jean, S., Ritter, A., Jurafsky, D.: Adversarial learning for neural dialogue generation. ar**v preprint ar**v:1701.06547 (2017)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Wen, T.H., et al.: A network-based end-to-end trainable task-oriented dialogue system. ar**v preprint ar**v:1604.04562 (2016)
Henderson, M., Thomson, B., Young, S.: Deep neural network approach for the dialog state tracking challenge. In: Proceedings of the SIGDIAL 2013 Conference, pp. 467–471 (2013)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)
Wen, T.H., Miao, Y., Blunsom, P., Young, S.: Latent intention dialogue models. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 3732–3741. JMLR. org (2017)
Eric, M., Manning, C.D.: Key-value retrieval networks for task-oriented dialogue. ar**v preprint ar**v:1705.05414 (2017)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. ar**v preprint ar**v:1412.6980 (2014)
Guo, J., Lu, S., Cai, H., Zhang, W., Yu, Y., Wang, J.: Long text generation via adversarial training with leaked information. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Donahue, D., Rumshisky, A.: Adversarial text generation without reinforcement learning. ar**v preprint ar**v:1810.06640 (2018)
Subramanian, S., Mudumba, S.R., Sordoni, A., Trischler, A., Courville, A.C., Pal, C.: Towards text generation with adversarially learned neural outlines. In: Advances in Neural Information Processing Systems, pp. 7551–7563 (2018)
Bau, D., et al.: GAN dissection: visualizing and understanding generative adversarial networks. ar**v preprint ar**v:1811.10597 (2018)
Acknowledgements
This work is supported in part by Chinese National Double First-rate Project about digital protection of cultural relics in Grotto Temple and equipment upgrading of the Chinese National Cultural Heritage Administration scientific research institutes.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Fang, T., Qiao, T., Xu, D. (2019). HaGAN: Hierarchical Attentive Adversarial Learning for Task-Oriented Dialogue System. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Lecture Notes in Computer Science(), vol 11953. Springer, Cham. https://doi.org/10.1007/978-3-030-36708-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-36708-4_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36707-7
Online ISBN: 978-3-030-36708-4
eBook Packages: Computer ScienceComputer Science (R0)