A Review of Deep Reinforcement Learning Algorithms and Comparative Results on Inverted Pendulum System

  • Chapter
  • First Online:
Machine Learning Paradigms

Part of the book series: Learning and Analytics in Intelligent Systems ((LAIS,volume 18))

Abstract

The control of inverted pendulum problem that is one of the classical control problems is important for many areas from autonomous vehicles to robotic. This chapter presents the usage of the deep reinforcement learning algorithms to control the cart-pole balancing problem. The first part of the chapter reviews the theories of deep reinforcement learning methods such as Deep Q Networks (DQN), DQN with Prioritized Experience Replay (DQN+PER), Double DQN (DDQN), Double Dueling Deep-Q Network (D3QN), Reinforce, Asynchronous Advanced Actor Critic Asynchronous (A3C) and Synchronous Advantage Actor-Critic (A2C). Then, the cart-pole balancing problem in OpenAI Gym environment is considered to implement the deep reinforcement learning methods. Finally, the performance of all methods are comparatively given on the cart-pole balancing problem. The results are presented by tables and figures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now
Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 117.69
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 160.49
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
EUR 160.49
Price includes VAT (Germany)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. K. Ogata, Y. Yang, Modern Control Engineering, vol. 4 (London, 2002)

    Google Scholar 

  2. A.E. Bryson, Applied Optimal Control: Optimization, Estimation and Control (Routledge, 2018)

    Google Scholar 

  3. M.L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley, 2014

    Google Scholar 

  4. R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction (MIT Press, 2018)

    Google Scholar 

  5. K. Vinotha, Bellman equation in dynamic programming. Int. J. Comput. Algorithm 3(3), 277–279 (2014)

    Google Scholar 

  6. C.J. Watkins, P. Dayan, Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)

    MATH  Google Scholar 

  7. S.S. Haykin, Neural Networks and Learning Machines (Prentice Hall, New York, 2009)

    Google Scholar 

  8. P. Abbeel, A. Coates, M. Quigley, A.Y. Ng, An application of reinforcement learning to aerobatic helicopter flight, in Advances in Neural Information Processing Systems (2007), pp. 1–8)

    Google Scholar 

  9. Y.C. Wang, J.M. Usher, Application of reinforcement learning for agent-based production scheduling. Eng. Appl. Artif. Intell. 18(1), 73–82 (2005)

    Google Scholar 

  10. L. Peshkin, V. Savova, Reinforcement learning for adaptive routing, in Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN’02 (Cat. No. 02CH37290), vol. 2 (IEEE, 2002), pp. 1825–1830. (2002, May)

    Google Scholar 

  11. X. Dai, C.K. Li, A.B. Rad, An approach to tune fuzzy controllers based on reinforcement learning for autonomous vehicle control. IEEE Trans. Intell. Transp. Syst. 6(3), 285–293 (2005)

    Google Scholar 

  12. B.C. Csáji, L. Monostori, B. Kádár, Reinforcement learning in a distributed market-based production control system. Adv. Eng. Inform. 20(3), 279–288 (2006)

    Google Scholar 

  13. W.D. Smart, L.P. Kaelbling, Effective reinforcement learning for mobile robots, in Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No. 02CH37292), vol. 4 (IEEE, 2002), pp. 3404–3410. (2002, May)

    Google Scholar 

  14. J.J. Choi, D. Laibson, B.C. Madrian, A. Metrick, Reinforcement learning and savings behavior. J. Financ. 64(6), 2515–2534 (2009)

    Google Scholar 

  15. M. Bowling, M. Veloso, An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning (No. CMU-CS-00-165). (Carnegie-Mellon University Pittsburgh Pa School of Computer Science, 2000)

    Google Scholar 

  16. D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484 (2016)

    Google Scholar 

  17. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller, Playing atari with deep reinforcement learning (2013). ar**v:1312.5602

  18. V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, S. Petersen Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)

    Google Scholar 

  19. H.V. Hasselt, Double Q-learning, in Advances in Neural Information Processing Systems (2010), pp. 2613–2621

    Google Scholar 

  20. H. Van Hasselt, A. Guez, D. Silver, Deep reinforcement learning with double q-learning, in Thirtieth AAAI Conference on Artificial Intelligence (2016, March)

    Google Scholar 

  21. Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, N. De Freitas, Dueling network architectures for deep reinforcement learning (2015). ar**v:1511.06581

  22. J. Sharma, P.A. Andersen, O.C. Granmo, M. Goodwin, Deep q-learning with q-matrix transfer learning for novel fire evacuation environment (2019). ar**v:1905.09673

  23. A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105

    Google Scholar 

  24. A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, L. Fei-Fei, Large-scale video classification with convolutional neural networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014), pp. 1725–1732

    Google Scholar 

  25. T.N. Sainath, O. Vinyals, A. Senior, H. Sak, Convolutional, long short-term memory, fully connected deep neural networks, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2015), pp. 4580–4584. (2015, April)

    Google Scholar 

  26. O. Abdel-Hamid, A.R. Mohamed, H. Jiang, G. Penn, Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition, in 2012 IEEE International Conference on Acoustics, Speech And Signal Processing (ICASSP) (IEEE, 2012), pp. 4277–4280. (2012, March)

    Google Scholar 

  27. A. Uçar, Y. Demir, C. Güzeliş, Moving towards in object recognition with deep learning for autonomous driving applications, in 2016 International Symposium on INnovations in Intelligent SysTems and Applications (INISTA) (IEEE, 2016), pp. 1–5. (2016, August)

    Google Scholar 

  28. H.A. Pierson, M.S. Gashler, Deep learning in robotics: a review of recent research. Adv. Robot. 31(16), 821–835 (2017)

    Google Scholar 

  29. L. Fei-Fei, J. Deng, K. Li, ImageNet: constructing a large-scale image database. J. Vis. 9(8), 1037–1037 (2009)

    Google Scholar 

  30. A.E. Sallab, M. Abdou, E. Perot, S. Yogamani, Deep reinforcement learning framework for autonomous driving. Electron. Imaging 2017(19), 70–76 (2017)

    Google Scholar 

  31. Y. Zhu, R. Mottaghi, E. Kolve, J.J. Lim, A. Gupta, L. Fei-Fei, A. Farhadi, Target-driven visual navigation in indoor scenes using deep reinforcement learning, in 2017 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2017), pp. 3357–3364. (2017, May)

    Google Scholar 

  32. S. Levine, C. Finn, T. Darrell, P. Abbeel, End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)

    MathSciNet  MATH  Google Scholar 

  33. L. **e, S. Wang, A. Markham, N. Trigoni, Towards monocular vision based obstacle avoidance through deep reinforcement learning (2017). ar**v:1706.09829

  34. X.B. Peng, M. van de Panne, Learning locomotion skills using deeprl: does the choice of action space matter?, in Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation (ACM, 2017), p. 12. (2017, July)

    Google Scholar 

  35. G. Kahn, A. Villaflor, B. Ding, P. Abbeel, S. Levine, Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation, in 2018 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2018), pp. 1–8. (2018, May)

    Google Scholar 

  36. T. Hester, M. Vecerik, O. Pietquin, M. Lanctot, T. Schaul, B. Piot, D. Horgan, J. Quan, A. Sendonaris, I. Osband, G. Dulac-Arnold, Deep q-learning from demonstrations, in Thirty-Second AAAI Conference on Artificial Intelligence (2018, April)

    Google Scholar 

  37. T. Schaul, J. Quan, I. Antonoglou, D. Silver, Prioritized experience replay (2015). ar**v:1511.05952

  38. R.J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992)

    MATH  Google Scholar 

  39. V. Mnih, A.P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, K. Kavukcuoglu, Asynchronous methods for deep reinforcement learning, in International Conference on Machine Learning, pp. 1928–1937. (2016, June)

    Google Scholar 

  40. A.V. Clemente, H.N. Castejón, A. Chandra, Efficient parallel methods for deep reinforcement learning (2017). ar**v:1705.04862

  41. R.R. Torrado, P. Bontrager, J. Togelius, J. Liu, D. Perez-Liebana, Deep reinforcement learning for general video game AI, in 2018 IEEE Conference on Computational Intelligence and Games (CIG) (IEEE, 2018), pp. 1–8. (2018, August)

    Google Scholar 

  42. G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, W. Zaremba, Openai gym (2016). ar**v:1606.01540

  43. Y. Demir, A. Uçar, Modelling and simulation with neural and fuzzy-neural networks of switched circuits. COMPEL Int. J. Comput. Math. Electr. Electron. Eng. 22(2), 253–272 (2003)

    MATH  Google Scholar 

  44. Ç. Kaymak, A. Uçar, A Brief survey and an application of semantic image segmentation for autonomous driving, in Handbook of Deep Learning Applications (Springer, Cham, 2019), pp. 161–200

    Google Scholar 

  45. A.W. Moore, C.G. Atkeson, Memory-based reinforcement learning: efficient computation with prioritized swee**, in Advances in Neural Information Processing Systems (1993), pp. 263–270

    Google Scholar 

  46. R. Özalp, C. Kaymak, Ö. Yildirum, A. Uçar, Y. Demir, C. Güzeliş, An implementation of vision based deep reinforcement learning for humanoid robot locomotion, in 2019 IEEE International Symposium on INnovations in Intelligent SysTems and Applications (INISTA) (IEEE, 2019), pp. 1–5 (2019, July)

    Google Scholar 

  47. https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html. Accessed 9 Sept 2019

  48. C. Knoll, K. Röbenack, Generation of stable limit cycles with prescribed frequency and amplitude via polynomial feedback, in International Multi-Conference on Systems, Signals & Devices (IEEE, 2012), pp. 1–6. (2012, March)

    Google Scholar 

  49. I. Goodfellow, Y. Bengio, A. Courville, Deep Learning. (MIT Press, 2016)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Scientific and Technological Research Council of Turkey (TUBITAK) grant numbers 117E589. In addition, GTX Titan X Pascal GPU in this research was donated by the NVIDIA Corporation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ayşegül Uçar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Özalp, R., Varol, N.K., Taşci, B., Uçar, A. (2020). A Review of Deep Reinforcement Learning Algorithms and Comparative Results on Inverted Pendulum System. In: Tsihrintzis, G., Jain, L. (eds) Machine Learning Paradigms. Learning and Analytics in Intelligent Systems, vol 18. Springer, Cham. https://doi.org/10.1007/978-3-030-49724-8_10

Download citation

Publish with us

Policies and ethics

Navigation