Abstract
This work develops a reinforcement learning method for multi-agent negotiation. While existing works have developed various learning methods for multi-agent negotiation, they have primarily focus on the Temporal-Difference (TD) algorithm (action-value methods) in general and overlooked the unique properties of parameterized policy. As such, these methods can be suboptimal for multi-agent negotiation. In this paper, we study the problem of multi-agent negotiation in real-time bidding scenario. We propose a new method named EQL, short for Extended Q-learning, which iteratively assigns the state transition probability and finally converges to a unique optimum effectively. By performing linear approximation of the off-policy critic purposefully, we integrate Expected Policy Gradients (EPG) into basic Q-learning. Importantly, we then propose a novel negotiation framework by accounting for both the EQL and edge computing between mobile devices and cloud servers to handle the data preprocessing and transmission simultaneously to reduce the load of cloud servers. We conduct extensive experiments on two real datasets. Both quantitative results and qualitative analysis verify the effectiveness and rationality of our EQL method.
C. Kong and B. Chen—The two authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Adikari, S., Dutta, K.: A new approach to real-time bidding in online advertisements: auto pricing strategy. INFORMS J. Comput. 31(1), 66–82 (2019). https://doi.org/10.1287/ijoc.2018.0812
Azoulay, R., Katz, R., Kraus, S.: Efficient bidding strategies for cliff-edge problems. Auton. Agents Multi Agent Syst. 28(2), 290–336 (2014). https://doi.org/10.1007/s10458-013-9227-z
Badidi, E., Atif, Y., Sheng, Q.Z., Maheswaran, M.: On personalized cloud service provisioning for mobile users using adaptive and context-aware service composition. Computing 101(4), 291–318 (2018). https://doi.org/10.1007/s00607-018-0631-8
Cai, H., Ren, K., Zhang, W., et al.: Real-time bidding by reinforcement learning in display advertising. In: WSDM 2017, Cambridge, United Kingdom, 6–10 February 2017, pp. 661–670 (2017). https://doi.org/10.1145/3018661.3018702
Chou, P., Maturana, D., Scherer, S.A.: Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution. In: ICML 2017, Sydney, NSW, Australia, 6–11 August 2017, pp. 834–843 (2017). http://proceedings.mlr.press/v70/chou17a.html
Ciosek, K., Whiteson, S.: Expected policy gradients. In: AAAI-18, IAAI-18, EAAI-18, New Orleans, Louisiana, USA, 2–7 February 2018, pp. 2868–2875 (2018). https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16116
Deng, E., Zhang, H., Wu, P., et al.: Pri-RTB: privacy-preserving real-time bidding for securing mobile advertisement in ubiquitous computing. Inf. Sci. 504, 354–371 (2019). https://doi.org/10.1016/j.ins.2019.07.034
Epstein, J.M.: Agent-based computational models and generative social science. Complexity 4(5), 41–60 (1999). https://doi.org/10.1002/(SICI)1099-0526(199905/06)4:53.0.CO;2-F
Farmer, J.D., Foley, D.: The economy needs agent-based modelling. Nature 460(7256), 685–686 (2009). https://doi.org/10.1119/1.3081304
Gu, S., Lillicrap, T.P., Ghahramani, Z., et al.: Q-prop: sample-efficient policy gradient with an off-policy critic. In: ICLR 2017, Toulon, France, 24–26 April 2017, Conference Track Proceedings (2017). https://openreview.net/forum?id=SJ3rcZcxl
Han, D., Zhang, J., Zhou, Y., Liu, Q., Yang, N.: Intelligent trader model based on dep reinforcement learning. In: Ni, W., Wang, X., Song, W., Li, Y. (eds.) WISA 2019. LNCS, vol. 11817, pp. 15–21. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30952-7_2
Hassan, N., Gillani, S., Ahmed, E., Yaqoob, I., Imran, M.: The role of edge computing in internet of things. IEEE Commun. Mag. 56(11), 110–115 (2018). https://doi.org/10.1109/MCOM.2018.1700906
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 12–17 February 2016, Phoenix, Arizona, USA, pp. 2094–2100 (2016)
Hosu, I., Rebedea, T.: Playing Atari games with deep reinforcement learning and human checkpoint replay. CoRR abs/1607.05077 (2016). http://arxiv.org/abs/1607.05077
III, L.C.B.: Residual algorithms: reinforcement learning with function approximation. In: Machine Learning, Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, California, USA, 9–12 July 1995, pp. 30–37 (1995). https://doi.org/10.1016/b978-1-55860-377-6.50013-x
Katz, R., Kraus, S.: Modeling human decision making in cliff-edge environments. In: Proceedings, The Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference, Boston, Massachusetts, USA, 16–20 July 2006, pp. 169–174 (2006). http://www.aaai.org/Library/AAAI/2006/aaai06-027.php
Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems 12, NIPS Conference, Denver, Colorado, USA, 29 November–4 December 1999, pp. 1008–1014 (1999). http://papers.nips.cc/paper/1786-actor-critic-algorithms
Mnih, V., Badia, A.P., Mirza, M., et al.: Asynchronous methods for deep reinforcement learning. In: ICML 2016, New York City, NY, USA, 19–24 June 2016, pp. 1928–1937 (2016). http://proceedings.mlr.press/v48/mniha16.html
Paliwal, P., Renov, O.: Gradient boosting censored regression for winning price prediction in real-time bidding. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds.) DASFAA 2019. LNCS, vol. 11448, pp. 348–352. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-18590-9_43
Perkins, T.J., Pendrith, M.D.: On the existence of fixed points for q-learning and Sarsa in partially observable domains. In: ICML 2002, University of New South Wales, Sydney, Australia, 8–12 July 2002, pp. 490–497 (2002). https://doi.org/10.5555/645531.756483
Watkins, C.J.C.H., Dayan, P.: Technical note q-learning. Mach. Learn. 8, 279–292 (1992). https://doi.org/10.1007/BF00992698
Wu, W.C., Yeh, M., Chen, M.: Predicting winning price in real time bidding with censored data. In: Proceedings of the 21th ACM SIGKDD, Sydney, NSW, Australia, 10–13 August 2015, pp. 1305–1314 (2015). https://doi.org/10.1145/2783258.2783276
Acknowledgment
This work was supported by the National Natural Science Foundation of China Youth Fund under Grant No. 61902001 and Initial Scientific Research Fund of Introduced Talents in Anhui Polytechnic University under Grant No. 2017YQQ015.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Kong, C., Chen, B., Li, S., Chen, J., Chen, Y., Zhang, L. (2020). An Advanced Q-Learning Model for Multi-agent Negotiation in Real-Time Bidding. In: Wang, G., Lin, X., Hendler, J., Song, W., Xu, Z., Liu, G. (eds) Web Information Systems and Applications. WISA 2020. Lecture Notes in Computer Science(), vol 12432. Springer, Cham. https://doi.org/10.1007/978-3-030-60029-7_44
Download citation
DOI: https://doi.org/10.1007/978-3-030-60029-7_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60028-0
Online ISBN: 978-3-030-60029-7
eBook Packages: Computer ScienceComputer Science (R0)