An Advanced Q-Learning Model for Multi-agent Negotiation in Real-Time Bidding

Kong, Chao; Chen, Baoxiang; Li, Shaoying; Chen, Jiahui; Chen, Yifan; Zhang, Li**

doi:10.1007/978-3-030-60029-7_44

Chao Kong¹⁴,
Baoxiang Chen¹⁴,
Shaoying Li¹⁴,
Jiahui Chen¹⁴,
Yifan Chen¹⁴ &
…
Li** Zhang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12432))

Included in the following conference series:

International Conference on Web Information Systems and Applications

1780 Accesses

Abstract

This work develops a reinforcement learning method for multi-agent negotiation. While existing works have developed various learning methods for multi-agent negotiation, they have primarily focus on the Temporal-Difference (TD) algorithm (action-value methods) in general and overlooked the unique properties of parameterized policy. As such, these methods can be suboptimal for multi-agent negotiation. In this paper, we study the problem of multi-agent negotiation in real-time bidding scenario. We propose a new method named EQL, short for Extended Q-learning, which iteratively assigns the state transition probability and finally converges to a unique optimum effectively. By performing linear approximation of the off-policy critic purposefully, we integrate Expected Policy Gradients (EPG) into basic Q-learning. Importantly, we then propose a novel negotiation framework by accounting for both the EQL and edge computing between mobile devices and cloud servers to handle the data preprocessing and transmission simultaneously to reduce the load of cloud servers. We conduct extensive experiments on two real datasets. Both quantitative results and qualitative analysis verify the effectiveness and rationality of our EQL method.

C. Kong and B. Chen—The two authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Automated Negotiating Agent with Strategy Adaptation for Multi-times Negotiations

Compromising Adjustment Based on Conflict Mode for Multi-times Bilateral Closed Nonlinear Negotiations

Alternate inference-decision reinforcement learning with generative adversarial inferring for bridge bidding

Article 22 May 2024

Notes

References

Adikari, S., Dutta, K.: A new approach to real-time bidding in online advertisements: auto pricing strategy. INFORMS J. Comput. 31(1), 66–82 (2019). https://doi.org/10.1287/ijoc.2018.0812
Azoulay, R., Katz, R., Kraus, S.: Efficient bidding strategies for cliff-edge problems. Auton. Agents Multi Agent Syst. 28(2), 290–336 (2014). https://doi.org/10.1007/s10458-013-9227-z
Article Google Scholar
Badidi, E., Atif, Y., Sheng, Q.Z., Maheswaran, M.: On personalized cloud service provisioning for mobile users using adaptive and context-aware service composition. Computing 101(4), 291–318 (2018). https://doi.org/10.1007/s00607-018-0631-8
Article Google Scholar
Cai, H., Ren, K., Zhang, W., et al.: Real-time bidding by reinforcement learning in display advertising. In: WSDM 2017, Cambridge, United Kingdom, 6–10 February 2017, pp. 661–670 (2017). https://doi.org/10.1145/3018661.3018702
Chou, P., Maturana, D., Scherer, S.A.: Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution. In: ICML 2017, Sydney, NSW, Australia, 6–11 August 2017, pp. 834–843 (2017). http://proceedings.mlr.press/v70/chou17a.html
Ciosek, K., Whiteson, S.: Expected policy gradients. In: AAAI-18, IAAI-18, EAAI-18, New Orleans, Louisiana, USA, 2–7 February 2018, pp. 2868–2875 (2018). https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16116
Deng, E., Zhang, H., Wu, P., et al.: Pri-RTB: privacy-preserving real-time bidding for securing mobile advertisement in ubiquitous computing. Inf. Sci. 504, 354–371 (2019). https://doi.org/10.1016/j.ins.2019.07.034
Epstein, J.M.: Agent-based computational models and generative social science. Complexity 4(5), 41–60 (1999). https://doi.org/10.1002/(SICI)1099-0526(199905/06)4:53.0.CO;2-F
Farmer, J.D., Foley, D.: The economy needs agent-based modelling. Nature 460(7256), 685–686 (2009). https://doi.org/10.1119/1.3081304
Gu, S., Lillicrap, T.P., Ghahramani, Z., et al.: Q-prop: sample-efficient policy gradient with an off-policy critic. In: ICLR 2017, Toulon, France, 24–26 April 2017, Conference Track Proceedings (2017). https://openreview.net/forum?id=SJ3rcZcxl
Han, D., Zhang, J., Zhou, Y., Liu, Q., Yang, N.: Intelligent trader model based on dep reinforcement learning. In: Ni, W., Wang, X., Song, W., Li, Y. (eds.) WISA 2019. LNCS, vol. 11817, pp. 15–21. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30952-7_2
Chapter Google Scholar
Hassan, N., Gillani, S., Ahmed, E., Yaqoob, I., Imran, M.: The role of edge computing in internet of things. IEEE Commun. Mag. 56(11), 110–115 (2018). https://doi.org/10.1109/MCOM.2018.1700906
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 12–17 February 2016, Phoenix, Arizona, USA, pp. 2094–2100 (2016)
Google Scholar
Hosu, I., Rebedea, T.: Playing Atari games with deep reinforcement learning and human checkpoint replay. CoRR abs/1607.05077 (2016). http://arxiv.org/abs/1607.05077
III, L.C.B.: Residual algorithms: reinforcement learning with function approximation. In: Machine Learning, Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, California, USA, 9–12 July 1995, pp. 30–37 (1995). https://doi.org/10.1016/b978-1-55860-377-6.50013-x
Katz, R., Kraus, S.: Modeling human decision making in cliff-edge environments. In: Proceedings, The Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference, Boston, Massachusetts, USA, 16–20 July 2006, pp. 169–174 (2006). http://www.aaai.org/Library/AAAI/2006/aaai06-027.php
Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems 12, NIPS Conference, Denver, Colorado, USA, 29 November–4 December 1999, pp. 1008–1014 (1999). http://papers.nips.cc/paper/1786-actor-critic-algorithms
Mnih, V., Badia, A.P., Mirza, M., et al.: Asynchronous methods for deep reinforcement learning. In: ICML 2016, New York City, NY, USA, 19–24 June 2016, pp. 1928–1937 (2016). http://proceedings.mlr.press/v48/mniha16.html
Paliwal, P., Renov, O.: Gradient boosting censored regression for winning price prediction in real-time bidding. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds.) DASFAA 2019. LNCS, vol. 11448, pp. 348–352. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-18590-9_43
Chapter Google Scholar
Perkins, T.J., Pendrith, M.D.: On the existence of fixed points for q-learning and Sarsa in partially observable domains. In: ICML 2002, University of New South Wales, Sydney, Australia, 8–12 July 2002, pp. 490–497 (2002). https://doi.org/10.5555/645531.756483
Watkins, C.J.C.H., Dayan, P.: Technical note q-learning. Mach. Learn. 8, 279–292 (1992). https://doi.org/10.1007/BF00992698
Wu, W.C., Yeh, M., Chen, M.: Predicting winning price in real time bidding with censored data. In: Proceedings of the 21th ACM SIGKDD, Sydney, NSW, Australia, 10–13 August 2015, pp. 1305–1314 (2015). https://doi.org/10.1145/2783258.2783276

Download references

Acknowledgment

This work was supported by the National Natural Science Foundation of China Youth Fund under Grant No. 61902001 and Initial Scientific Research Fund of Introduced Talents in Anhui Polytechnic University under Grant No. 2017YQQ015.

Author information

Authors and Affiliations

School of Computer and Information, Anhui Polytechnic University, Wuhu, China
Chao Kong, Baoxiang Chen, Shaoying Li, Jiahui Chen, Yifan Chen & Li** Zhang

Authors

Chao Kong
View author publications
You can also search for this author in PubMed Google Scholar
Baoxiang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shaoying Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiahui Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yifan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Li** Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chao Kong .

Editor information

Editors and Affiliations

Guangzhou University, Guangzhou, China
Guojun Wang
The University of New South Wales, Sydney, NSW, Australia
Xuemin Lin
Rensselaer Polytechnic Institute, Troy, NY, USA
James Hendler
Wuhan University, Wuhan, China
Wei Song
Hohai University, Nan**g, China
Zhuoming Xu
Fuzhou University, Fuzhou, China
Genggeng Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kong, C., Chen, B., Li, S., Chen, J., Chen, Y., Zhang, L. (2020). An Advanced Q-Learning Model for Multi-agent Negotiation in Real-Time Bidding. In: Wang, G., Lin, X., Hendler, J., Song, W., Xu, Z., Liu, G. (eds) Web Information Systems and Applications. WISA 2020. Lecture Notes in Computer Science(), vol 12432. Springer, Cham. https://doi.org/10.1007/978-3-030-60029-7_44

Download citation

DOI: https://doi.org/10.1007/978-3-030-60029-7_44
Published: 22 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60028-0
Online ISBN: 978-3-030-60029-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

An Advanced Q-Learning Model for Multi-agent Negotiation in Real-Time Bidding

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Automated Negotiating Agent with Strategy Adaptation for Multi-times Negotiations

Compromising Adjustment Based on Conflict Mode for Multi-times Bilateral Closed Nonlinear Negotiations

Alternate inference-decision reinforcement learning with generative adversarial inferring for bridge bidding

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

An Advanced Q-Learning Model for Multi-agent Negotiation in Real-Time Bidding

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Automated Negotiating Agent with Strategy Adaptation for Multi-times Negotiations

Compromising Adjustment Based on Conflict Mode for Multi-times Bilateral Closed Nonlinear Negotiations

Alternate inference-decision reinforcement learning with generative adversarial inferring for bridge bidding

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation