Abstract
Reinforcement learning is a very active field of research with many practical applications. Success in many cases is driven by combining it with Deep Learning. In this paper we present results of our attempt to use modern advancements in this area for automated management of resources used to host distributed software. We describe the use of three policy training algorithms from the policy gradient optimization family, to create a policy used to control the behavior of an autonomous management agent. The agent is interacting with a simulated cloud computing environment, which is processing a stream of computing jobs. We discuss and compare the policy performance aspects and the feasibility to use them in real-world scenarios.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Achiam, J.: OpenAI spinning up (2018). https://github.com/openai/spinningup. Accessed 30 Apr 2019
Brockman, G., et al.: OpenAI gym. CoRR abs/1606.01540 (2016). http://arxiv.org/abs/1606.01540
Filho, M.C.S., Oliveira, R.L., Monteiro, C.C., Inácio, P.R.M., Freire, M.M.: CloudSim plus: a cloud computing simulation framework pursuing software engineering principles for improved modularity, extensibility and correctness. In: 2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), pp. 400–406, May 2017
Funika, W., Koperek, P., Kitowski, J.: Repeatable experiments in the cloud resources management domain with use of reinforcement learning. In: Cracow Grid Workshop 2018, pp. 31–32. ACC Cyfronet AGH, Kraków (2018)
Grondman, I., Busoniu, L., Lopes, G., Babuska, R.: A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(6), 1291–1307 (2012)
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: Proceedings 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, Piscataway, May 2017
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI 2016, pp. 2094–2100. AAAI Press (2016)
Hussain, A., Aleem, M., Azhar, M., Muhammad, I., Islam, A.: Investigation of cloud scheduling algorithms for resource utilization using CloudSim. Comput. Inf. 38, 525–554 (2019)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. CoRR cs.AI/9605103 (1996). http://arxiv.org/abs/cs.AI/9605103
Kalashnikov, D., et al.: QT-Opt: scalable deep reinforcement learning for vision-based robotic manipulation. CoRR abs/1806.10293 (2018). http://arxiv.org/abs/1806.10293
Mnih, V., et al.: Playing atari with deep reinforcement learning (2013). http://arxiv.org/abs/1312.5602
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of the 33rd International Conference on Machine Learning, vol. 48, pp. 1928–1937. PMLR, June 2016
Nikolow, D., Slota, R., Polak, S., Pogoda, M., Kitowski, J.: Policy-based SLA storage management model for distributed data storage services. Comput. Sci. 19, 405 (2018)
Rufus, R., Nick, W., Shelton, J., Esterline, A.C.: An autonomic computing system based on a rule-based policy engine and artificial immune systems. In: MAICS. CEUR Workshop Proceedings, vol. 1584, pp. 105–108. CEUR-WS.org (2016)
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. In: ICLR (2016)
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization. CoRR abs/1502.05477 (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017). http://arxiv.org/abs/1707.06347
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550, 354–359 (2017)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998). http://www.cs.ualberta.ca/~sutton/book/the-book.html
Sutton, R.S.: Temporal credit assignment in reinforcement learning. Ph.D. thesis (1984)
Szepesvari, C.: Algorithms for Reinforcement Learning. Morgan and Claypool Publishers, San Rafael (2010)
Wang, Z., Gwon, C., Oates, T., Iezzi, A.: Automated cloud provisioning on AWS using deep reinforcement learning. CoRR abs/1709.04305 (2017). http://arxiv.org/abs/1709.04305
Witten, I.H.: An adaptive optimal controller for discrete-time Markov environments. Inform. Control 34, 286–295 (1977)
Xu, C.Z., Rao, J., Bu, X.: URL: a unified reinforcement learning approach for autonomic cloud management. J. Parallel Distrib. Comput. 72(2), 95–105 (2012)
Acknowledgements
The paper was partially financed by AGH University of Science and Technology Statutory Fund. Computational experiments were carried out on the PL-Grid infrastructure.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Funika, W., Koperek, P. (2020). Evaluating the Use of Policy Gradient Optimization Approach for Automatic Cloud Resource Provisioning. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2019. Lecture Notes in Computer Science(), vol 12043. Springer, Cham. https://doi.org/10.1007/978-3-030-43229-4_40
Download citation
DOI: https://doi.org/10.1007/978-3-030-43229-4_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43228-7
Online ISBN: 978-3-030-43229-4
eBook Packages: Computer ScienceComputer Science (R0)