An Advanced Q-Learning Model for Multi-agent Negotiation in Real-Time Bidding

  • Conference paper
  • First Online:
Web Information Systems and Applications (WISA 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12432))

Included in the following conference series:

  • 1780 Accesses

Abstract

This work develops a reinforcement learning method for multi-agent negotiation. While existing works have developed various learning methods for multi-agent negotiation, they have primarily focus on the Temporal-Difference (TD) algorithm (action-value methods) in general and overlooked the unique properties of parameterized policy. As such, these methods can be suboptimal for multi-agent negotiation. In this paper, we study the problem of multi-agent negotiation in real-time bidding scenario. We propose a new method named EQL, short for Extended Q-learning, which iteratively assigns the state transition probability and finally converges to a unique optimum effectively. By performing linear approximation of the off-policy critic purposefully, we integrate Expected Policy Gradients (EPG) into basic Q-learning. Importantly, we then propose a novel negotiation framework by accounting for both the EQL and edge computing between mobile devices and cloud servers to handle the data preprocessing and transmission simultaneously to reduce the load of cloud servers. We conduct extensive experiments on two real datasets. Both quantitative results and qualitative analysis verify the effectiveness and rationality of our EQL method.

C. Kong and B. Chen—The two authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.kaggle.com/a**kyablaze/football-manager-data.

  2. 2.

    https://www.kaggle.com/thec03u5/fifa-18-demo-player-dataset.

References

  1. Adikari, S., Dutta, K.: A new approach to real-time bidding in online advertisements: auto pricing strategy. INFORMS J. Comput. 31(1), 66–82 (2019). https://doi.org/10.1287/ijoc.2018.0812

  2. Azoulay, R., Katz, R., Kraus, S.: Efficient bidding strategies for cliff-edge problems. Auton. Agents Multi Agent Syst. 28(2), 290–336 (2014). https://doi.org/10.1007/s10458-013-9227-z

    Article  Google Scholar 

  3. Badidi, E., Atif, Y., Sheng, Q.Z., Maheswaran, M.: On personalized cloud service provisioning for mobile users using adaptive and context-aware service composition. Computing 101(4), 291–318 (2018). https://doi.org/10.1007/s00607-018-0631-8

    Article  Google Scholar 

  4. Cai, H., Ren, K., Zhang, W., et al.: Real-time bidding by reinforcement learning in display advertising. In: WSDM 2017, Cambridge, United Kingdom, 6–10 February 2017, pp. 661–670 (2017). https://doi.org/10.1145/3018661.3018702

  5. Chou, P., Maturana, D., Scherer, S.A.: Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution. In: ICML 2017, Sydney, NSW, Australia, 6–11 August 2017, pp. 834–843 (2017). http://proceedings.mlr.press/v70/chou17a.html

  6. Ciosek, K., Whiteson, S.: Expected policy gradients. In: AAAI-18, IAAI-18, EAAI-18, New Orleans, Louisiana, USA, 2–7 February 2018, pp. 2868–2875 (2018). https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16116

  7. Deng, E., Zhang, H., Wu, P., et al.: Pri-RTB: privacy-preserving real-time bidding for securing mobile advertisement in ubiquitous computing. Inf. Sci. 504, 354–371 (2019). https://doi.org/10.1016/j.ins.2019.07.034

  8. Epstein, J.M.: Agent-based computational models and generative social science. Complexity 4(5), 41–60 (1999). https://doi.org/10.1002/(SICI)1099-0526(199905/06)4:53.0.CO;2-F

  9. Farmer, J.D., Foley, D.: The economy needs agent-based modelling. Nature 460(7256), 685–686 (2009). https://doi.org/10.1119/1.3081304

  10. Gu, S., Lillicrap, T.P., Ghahramani, Z., et al.: Q-prop: sample-efficient policy gradient with an off-policy critic. In: ICLR 2017, Toulon, France, 24–26 April 2017, Conference Track Proceedings (2017). https://openreview.net/forum?id=SJ3rcZcxl

  11. Han, D., Zhang, J., Zhou, Y., Liu, Q., Yang, N.: Intelligent trader model based on dep reinforcement learning. In: Ni, W., Wang, X., Song, W., Li, Y. (eds.) WISA 2019. LNCS, vol. 11817, pp. 15–21. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30952-7_2

    Chapter  Google Scholar 

  12. Hassan, N., Gillani, S., Ahmed, E., Yaqoob, I., Imran, M.: The role of edge computing in internet of things. IEEE Commun. Mag. 56(11), 110–115 (2018). https://doi.org/10.1109/MCOM.2018.1700906

  13. van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 12–17 February 2016, Phoenix, Arizona, USA, pp. 2094–2100 (2016)

    Google Scholar 

  14. Hosu, I., Rebedea, T.: Playing Atari games with deep reinforcement learning and human checkpoint replay. CoRR abs/1607.05077 (2016). http://arxiv.org/abs/1607.05077

  15. III, L.C.B.: Residual algorithms: reinforcement learning with function approximation. In: Machine Learning, Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, California, USA, 9–12 July 1995, pp. 30–37 (1995). https://doi.org/10.1016/b978-1-55860-377-6.50013-x

  16. Katz, R., Kraus, S.: Modeling human decision making in cliff-edge environments. In: Proceedings, The Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference, Boston, Massachusetts, USA, 16–20 July 2006, pp. 169–174 (2006). http://www.aaai.org/Library/AAAI/2006/aaai06-027.php

  17. Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems 12, NIPS Conference, Denver, Colorado, USA, 29 November–4 December 1999, pp. 1008–1014 (1999). http://papers.nips.cc/paper/1786-actor-critic-algorithms

  18. Mnih, V., Badia, A.P., Mirza, M., et al.: Asynchronous methods for deep reinforcement learning. In: ICML 2016, New York City, NY, USA, 19–24 June 2016, pp. 1928–1937 (2016). http://proceedings.mlr.press/v48/mniha16.html

  19. Paliwal, P., Renov, O.: Gradient boosting censored regression for winning price prediction in real-time bidding. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds.) DASFAA 2019. LNCS, vol. 11448, pp. 348–352. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-18590-9_43

    Chapter  Google Scholar 

  20. Perkins, T.J., Pendrith, M.D.: On the existence of fixed points for q-learning and Sarsa in partially observable domains. In: ICML 2002, University of New South Wales, Sydney, Australia, 8–12 July 2002, pp. 490–497 (2002). https://doi.org/10.5555/645531.756483

  21. Watkins, C.J.C.H., Dayan, P.: Technical note q-learning. Mach. Learn. 8, 279–292 (1992). https://doi.org/10.1007/BF00992698

  22. Wu, W.C., Yeh, M., Chen, M.: Predicting winning price in real time bidding with censored data. In: Proceedings of the 21th ACM SIGKDD, Sydney, NSW, Australia, 10–13 August 2015, pp. 1305–1314 (2015). https://doi.org/10.1145/2783258.2783276

Download references

Acknowledgment

This work was supported by the National Natural Science Foundation of China Youth Fund under Grant No. 61902001 and Initial Scientific Research Fund of Introduced Talents in Anhui Polytechnic University under Grant No. 2017YQQ015.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao Kong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kong, C., Chen, B., Li, S., Chen, J., Chen, Y., Zhang, L. (2020). An Advanced Q-Learning Model for Multi-agent Negotiation in Real-Time Bidding. In: Wang, G., Lin, X., Hendler, J., Song, W., Xu, Z., Liu, G. (eds) Web Information Systems and Applications. WISA 2020. Lecture Notes in Computer Science(), vol 12432. Springer, Cham. https://doi.org/10.1007/978-3-030-60029-7_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60029-7_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60028-0

  • Online ISBN: 978-3-030-60029-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation