Abstract
Reinforcement Learning (RL) is an emerging technology for designing control systems that find optimal policy, through simulated or actual experience, according to a performance measure given by the designer. This paper discusses a widely used RL algorithm called Q-learning. This paper discuss how to apply these algorithms to robotics and optimal control systems, where several key challenges must be addressed for it to be useful. We discuss how Q-learning algorithm can adapted to work in continuous states and action spaces, the methods for computing rewards which generates an adaptive optimal controller and accelerate learning process and finally the safe exploration approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sutton, R., Barto, B.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Cao, X.: Stochastic Learning and Optimization. Springer, Heidelberg (2009)
Bertsekas, D.: Neuro-Dynamic Programming. Athena Scientific, Belmon (1996)
Sutton, R., Barto, A., Williams, R.: Reinforcement learning is direct adaptive optimal control. IEEE Control Syst. 12, 19–22 (1992)
Tesauro, G.: TD-Gammon, a self-teaching backgammon program achieves master-level Play. Neural Comput. 6(2), 215–219 (1994)
Szepesvri, C.: Algorithms for Reinforcement Learning. Morgan and Claypool, San Rafael (2010)
Randlv, P., Alstrm, P.: Learning to drive a bicycle using reinforcement learning and sha**. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 463–471 (1998)
Abbeel, P., Coates, A., Quigley, M., Ng, A.: An application of reinforcement learning to aerobatic helicopter flight. In: Advances in Neural Information Processing Systems, vol. 19. MIT press (2007)
Wang, F., Zhang, H., Liu, D.: Adaptive dynamic programming: an introduction. IEEE Comput. Intell. Mag. 4(2), 39–47 (2009)
Busoniu, L., Babuska, R., De Schutter, B., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, New York (2010)
Kaelbling, L., Littman, M., Moore, A.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Kober, J., Bagnell, J., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32, 1238–1274 (2013)
Bhasin, S.: Reinforcement learning and optimal control methods for uncertainnonlinear systems. Ph.D., University of Florida (2011)
Hester, T., Quinlan, M., Stone, P.: RTMBA: a real-time model-based reinforcement learning architecture for robot control. In: International Conference on Robotics and Automation, ICRA 2010, pp. 85–90 (2012)
Kim, H., Jordan, M., Sastry, S., Ng, A.: Autonomous helicopter flight via reinforcement learning. In: Advances in Neural Information Processing Systems (2003)
El-Telbany, M.: Reinforcement Learning Algorithms For Multi-Robot Organization Ph.D. thesis, Faculty of Engineering, Cairo Univrsity (2003)
Erfu, Y., Dongbing, G.: Multiagent reinforcement learning for multi-robot systems: a survey. Technical report (2004)
Ng, A.: Sha** and policy search in Reinforcement Learning. Ph.D., Universityof California, Berkeley (2003)
Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of Machine Learning. The MIT Press, Cambridge (2012)
Watkins, C., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992)
Grondman, I., Busoniu, L., Lopes, G., Babuka, R.: A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans. Syst. Man Cybern Part C: Appl. Rev. 4(2), 39–47 (2012)
Heidrich-Meisner, V., Lauer, M., Igel, C., Riedmiller, M.: Reinforcement learning in a nutshell. In: 15th European Symposium on Artificial Neural Networks (ESANN2007), pp. 277–288 (2007)
van Hasselt, H.: Reinforcement learning in continuous state and action spaces. In: Wiering, M., van Otterlo, M. (eds.) Reinforcement Learning. ALO, vol. 12, pp. 205–248. Springer, Heidelberg (2012)
Gaskett, C.: Q-Learning for Robot Control. Ph.D., Australian National University (2002)
Lin, L.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. Austr. Natl. Univ. 8, 293–321 (1992)
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning. CoRR (2013)
Hagen, S., Krose, B.: Neural Q-learning. Neural Comput. Appl. 21(2), 81–88 (2003)
Takahashi, Y., Takeda, M., Asada, M.: Continuous valued Q-learning for vision-guided behavior. In: International Conference on Multisensor Fusion and Integration for Intelligent Systems (1999)
Smart, W.: Making Reinforcement Learning Work on Real Robots. Ph.D., BrownUniversity (2002)
Duff, M.: Optimal learning: computational procedures for bayes adaptive Markov decision processes. Ph.D. dissertation, University of Massachusetts (2002)
Carden, S.: Convergence of a Q-learning variant for continuous states and actions. J. Artif. Intell. Res. 49, 705–731 (2014)
Nadaraya, E.: On estimating regression. Theory Prob. Appl. 9(1), 141–142 (1964)
Kirk, D.: Optimal Control Theory. An Introduction. Prentice Hall, Englewood Cliffs (1970)
Anderson, B., Moore, J.: Optimal Control: Linear Quadratic Methods. Prentice Hall, Upper Saddle River (1989)
Ng, A.Y., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., Liang, E.: Autonomous inverted helicopter flight via reinforcement learning. In: Ang, M.H., Khatib, O. (eds.) Experimental Robotics IX. STAR, vol. 21, pp. 363–372. Springer, Heidelberg (2006). doi:10.1007/11552246_35
Ng, A., Harada, D., Russell, S.: Potential-based sha** in model-based reinforcement learning. In: Proceedings of the 16th International Conference on Machine Learning (1999)
Matric, M.: Reward functions for accelerated learning. In: Proceedings of the 11th International Conference on Machine Learning, pp. 181–189 (1994)
Konidaris, G., Barto, A.: Autonomous sha**: Knowledge transfer in reinforcement learning. In: Proceedings of the 23th International Conference on Machine Learning (2006)
Asmuth, J., Littman, M., Zinkov, R.: Potential-based sha** in model-based reinforcement learning. In: Proceedings of AAAI Conference on Artificial Intelligence (2008)
Thrun, S.: The role of exploration in learning control. In: White, D., Sofg, D. (eds.) Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches (1992)
Brafman, R., Tennenholtz, M.: R-max a general polynomial time algorithm for near optimal reinforcement learning. J. Mach. Learn. Res. 3, 213–231 (2002)
Garcia, J., Fernandez, F.: Safe exploration of state and action spaces in reinforcement learning. J. Artif. Intell. Res. 45, 515–564 (2012)
Peters, J., Mulling, K., Altun, Y.: Relative entropy policy search. In: Proceedings of the National Conference on Artificial Intelligence, pp. 1607–1612 (2010)
Deisenroth, M., Rasmussen, C., Fox, D.: Learning to control a low-cost manipulator using data-efficient reinforcement learning. In: Proceedings of the International Conference on Robotics: Science and Systems (2011)
Wiewiora, E., Cottrell, G., Elkan, C.: Principled methods for advising reinforcement learning agents. In: ICML, pp. 792–799 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
El-Telbany, M.E. (2017). The Challenges of Reinforcement Learning in Robotics and Optimal Control. In: Hassanien, A., Shaalan, K., Gaber, T., Azar, A., Tolba, M. (eds) Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016. AISI 2016. Advances in Intelligent Systems and Computing, vol 533. Springer, Cham. https://doi.org/10.1007/978-3-319-48308-5_84
Download citation
DOI: https://doi.org/10.1007/978-3-319-48308-5_84
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48307-8
Online ISBN: 978-3-319-48308-5
eBook Packages: EngineeringEngineering (R0)