A Review of Deep Reinforcement Learning Algorithms and Comparative Results on Inverted Pendulum System

Özalp, Recep; Varol, Nuri Köksal; Taşci, Burak; Uçar, Ayşegül

doi:10.1007/978-3-030-49724-8_10

Recep Özalp⁶,
Nuri Köksal Varol⁶,
Burak Taşci⁷ &
…
Ayşegül Uçar⁶

Part of the book series: Learning and Analytics in Intelligent Systems ((LAIS,volume 18))

2174 Accesses
9 Citations

Abstract

The control of inverted pendulum problem that is one of the classical control problems is important for many areas from autonomous vehicles to robotic. This chapter presents the usage of the deep reinforcement learning algorithms to control the cart-pole balancing problem. The first part of the chapter reviews the theories of deep reinforcement learning methods such as Deep Q Networks (DQN), DQN with Prioritized Experience Replay (DQN+PER), Double DQN (DDQN), Double Dueling Deep-Q Network (D3QN), Reinforce, Asynchronous Advanced Actor Critic Asynchronous (A3C) and Synchronous Advantage Actor-Critic (A2C). Then, the cart-pole balancing problem in OpenAI Gym environment is considered to implement the deep reinforcement learning methods. Finally, the performance of all methods are comparatively given on the cart-pole balancing problem. The results are presented by tables and figures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 117.69; Price includes VAT (Germany)

Softcover Book: EUR 160.49; Price includes VAT (Germany)

Hardcover Book: EUR 160.49; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Reinforcement Learning DDPG–PPO Agent-Based Control System for Rotary Inverted Pendulum

Article 08 June 2023

Balance Control for the First-order Inverted Pendulum Based on the Advantage Actor-critic Algorithm

Article 24 June 2020

Study of Q-learning and deep Q-network learning control for a rotary inverted pendulum system

Article Open access 02 February 2024

References

K. Ogata, Y. Yang, Modern Control Engineering, vol. 4 (London, 2002)
Google Scholar
A.E. Bryson, Applied Optimal Control: Optimization, Estimation and Control (Routledge, 2018)
Google Scholar
M.L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley, 2014
Google Scholar
R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction (MIT Press, 2018)
Google Scholar
K. Vinotha, Bellman equation in dynamic programming. Int. J. Comput. Algorithm 3(3), 277–279 (2014)
Google Scholar
C.J. Watkins, P. Dayan, Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
MATH Google Scholar
S.S. Haykin, Neural Networks and Learning Machines (Prentice Hall, New York, 2009)
Google Scholar
P. Abbeel, A. Coates, M. Quigley, A.Y. Ng, An application of reinforcement learning to aerobatic helicopter flight, in Advances in Neural Information Processing Systems (2007), pp. 1–8)
Google Scholar
Y.C. Wang, J.M. Usher, Application of reinforcement learning for agent-based production scheduling. Eng. Appl. Artif. Intell. 18(1), 73–82 (2005)
Google Scholar
L. Peshkin, V. Savova, Reinforcement learning for adaptive routing, in Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN’02 (Cat. No. 02CH37290), vol. 2 (IEEE, 2002), pp. 1825–1830. (2002, May)
Google Scholar
X. Dai, C.K. Li, A.B. Rad, An approach to tune fuzzy controllers based on reinforcement learning for autonomous vehicle control. IEEE Trans. Intell. Transp. Syst. 6(3), 285–293 (2005)
Google Scholar
B.C. Csáji, L. Monostori, B. Kádár, Reinforcement learning in a distributed market-based production control system. Adv. Eng. Inform. 20(3), 279–288 (2006)
Google Scholar
W.D. Smart, L.P. Kaelbling, Effective reinforcement learning for mobile robots, in Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No. 02CH37292), vol. 4 (IEEE, 2002), pp. 3404–3410. (2002, May)
Google Scholar
J.J. Choi, D. Laibson, B.C. Madrian, A. Metrick, Reinforcement learning and savings behavior. J. Financ. 64(6), 2515–2534 (2009)
Google Scholar
M. Bowling, M. Veloso, An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning (No. CMU-CS-00-165). (Carnegie-Mellon University Pittsburgh Pa School of Computer Science, 2000)
Google Scholar
D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484 (2016)
Google Scholar
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller, Playing atari with deep reinforcement learning (2013). ar**v:1312.5602
V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, S. Petersen Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Google Scholar
H.V. Hasselt, Double Q-learning, in Advances in Neural Information Processing Systems (2010), pp. 2613–2621
Google Scholar
H. Van Hasselt, A. Guez, D. Silver, Deep reinforcement learning with double q-learning, in Thirtieth AAAI Conference on Artificial Intelligence (2016, March)
Google Scholar
Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, N. De Freitas, Dueling network architectures for deep reinforcement learning (2015). ar**v:1511.06581
J. Sharma, P.A. Andersen, O.C. Granmo, M. Goodwin, Deep q-learning with q-matrix transfer learning for novel fire evacuation environment (2019). ar**v:1905.09673
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105
Google Scholar
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, L. Fei-Fei, Large-scale video classification with convolutional neural networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014), pp. 1725–1732
Google Scholar
T.N. Sainath, O. Vinyals, A. Senior, H. Sak, Convolutional, long short-term memory, fully connected deep neural networks, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2015), pp. 4580–4584. (2015, April)
Google Scholar
O. Abdel-Hamid, A.R. Mohamed, H. Jiang, G. Penn, Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition, in 2012 IEEE International Conference on Acoustics, Speech And Signal Processing (ICASSP) (IEEE, 2012), pp. 4277–4280. (2012, March)
Google Scholar
A. Uçar, Y. Demir, C. Güzeliş, Moving towards in object recognition with deep learning for autonomous driving applications, in 2016 International Symposium on INnovations in Intelligent SysTems and Applications (INISTA) (IEEE, 2016), pp. 1–5. (2016, August)
Google Scholar
H.A. Pierson, M.S. Gashler, Deep learning in robotics: a review of recent research. Adv. Robot. 31(16), 821–835 (2017)
Google Scholar
L. Fei-Fei, J. Deng, K. Li, ImageNet: constructing a large-scale image database. J. Vis. 9(8), 1037–1037 (2009)
Google Scholar
A.E. Sallab, M. Abdou, E. Perot, S. Yogamani, Deep reinforcement learning framework for autonomous driving. Electron. Imaging 2017(19), 70–76 (2017)
Google Scholar
Y. Zhu, R. Mottaghi, E. Kolve, J.J. Lim, A. Gupta, L. Fei-Fei, A. Farhadi, Target-driven visual navigation in indoor scenes using deep reinforcement learning, in 2017 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2017), pp. 3357–3364. (2017, May)
Google Scholar
S. Levine, C. Finn, T. Darrell, P. Abbeel, End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)
MathSciNet MATH Google Scholar
L. **e, S. Wang, A. Markham, N. Trigoni, Towards monocular vision based obstacle avoidance through deep reinforcement learning (2017). ar**v:1706.09829
X.B. Peng, M. van de Panne, Learning locomotion skills using deeprl: does the choice of action space matter?, in Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation (ACM, 2017), p. 12. (2017, July)
Google Scholar
G. Kahn, A. Villaflor, B. Ding, P. Abbeel, S. Levine, Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation, in 2018 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2018), pp. 1–8. (2018, May)
Google Scholar
T. Hester, M. Vecerik, O. Pietquin, M. Lanctot, T. Schaul, B. Piot, D. Horgan, J. Quan, A. Sendonaris, I. Osband, G. Dulac-Arnold, Deep q-learning from demonstrations, in Thirty-Second AAAI Conference on Artificial Intelligence (2018, April)
Google Scholar
T. Schaul, J. Quan, I. Antonoglou, D. Silver, Prioritized experience replay (2015). ar**v:1511.05952
R.J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992)
MATH Google Scholar
V. Mnih, A.P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, K. Kavukcuoglu, Asynchronous methods for deep reinforcement learning, in International Conference on Machine Learning, pp. 1928–1937. (2016, June)
Google Scholar
A.V. Clemente, H.N. Castejón, A. Chandra, Efficient parallel methods for deep reinforcement learning (2017). ar**v:1705.04862
R.R. Torrado, P. Bontrager, J. Togelius, J. Liu, D. Perez-Liebana, Deep reinforcement learning for general video game AI, in 2018 IEEE Conference on Computational Intelligence and Games (CIG) (IEEE, 2018), pp. 1–8. (2018, August)
Google Scholar
G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, W. Zaremba, Openai gym (2016). ar**v:1606.01540
Y. Demir, A. Uçar, Modelling and simulation with neural and fuzzy-neural networks of switched circuits. COMPEL Int. J. Comput. Math. Electr. Electron. Eng. 22(2), 253–272 (2003)
MATH Google Scholar
Ç. Kaymak, A. Uçar, A Brief survey and an application of semantic image segmentation for autonomous driving, in Handbook of Deep Learning Applications (Springer, Cham, 2019), pp. 161–200
Google Scholar
A.W. Moore, C.G. Atkeson, Memory-based reinforcement learning: efficient computation with prioritized swee**, in Advances in Neural Information Processing Systems (1993), pp. 263–270
Google Scholar
R. Özalp, C. Kaymak, Ö. Yildirum, A. Uçar, Y. Demir, C. Güzeliş, An implementation of vision based deep reinforcement learning for humanoid robot locomotion, in 2019 IEEE International Symposium on INnovations in Intelligent SysTems and Applications (INISTA) (IEEE, 2019), pp. 1–5 (2019, July)
Google Scholar
https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html. Accessed 9 Sept 2019
C. Knoll, K. Röbenack, Generation of stable limit cycles with prescribed frequency and amplitude via polynomial feedback, in International Multi-Conference on Systems, Signals & Devices (IEEE, 2012), pp. 1–6. (2012, March)
Google Scholar
I. Goodfellow, Y. Bengio, A. Courville, Deep Learning. (MIT Press, 2016)
Google Scholar

Download references

Acknowledgements

This work was supported by the Scientific and Technological Research Council of Turkey (TUBITAK) grant numbers 117E589. In addition, GTX Titan X Pascal GPU in this research was donated by the NVIDIA Corporation.

Author information

Authors and Affiliations

Department of Mechatronics Engineering, Firat University, 23119, Elazig, Turkey
Recep Özalp, Nuri Köksal Varol & Ayşegül Uçar
Vocational School of Technical Sciences, Firat University, 23119, Elazig, Turkey
Burak Taşci

Authors

Recep Özalp
View author publications
You can also search for this author in PubMed Google Scholar
Nuri Köksal Varol
View author publications
You can also search for this author in PubMed Google Scholar
Burak Taşci
View author publications
You can also search for this author in PubMed Google Scholar
Ayşegül Uçar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ayşegül Uçar .

Editor information

Editors and Affiliations

Department of Informatics, University of Piraeus, Piraeus, Greece
George A. Tsihrintzis
University of Technology Sydney, NSW, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Özalp, R., Varol, N.K., Taşci, B., Uçar, A. (2020). A Review of Deep Reinforcement Learning Algorithms and Comparative Results on Inverted Pendulum System. In: Tsihrintzis, G., Jain, L. (eds) Machine Learning Paradigms. Learning and Analytics in Intelligent Systems, vol 18. Springer, Cham. https://doi.org/10.1007/978-3-030-49724-8_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-49724-8_10
Published: 24 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49723-1
Online ISBN: 978-3-030-49724-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

A Review of Deep Reinforcement Learning Algorithms and Comparative Results on Inverted Pendulum System

Abstract

Access this chapter

Similar content being viewed by others

Reinforcement Learning DDPG–PPO Agent-Based Control System for Rotary Inverted Pendulum

Balance Control for the First-order Inverted Pendulum Based on the Advantage Actor-critic Algorithm

Study of Q-learning and deep Q-network learning control for a rotary inverted pendulum system

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

A Review of Deep Reinforcement Learning Algorithms and Comparative Results on Inverted Pendulum System

Abstract

Access this chapter

Similar content being viewed by others

Reinforcement Learning DDPG–PPO Agent-Based Control System for Rotary Inverted Pendulum

Balance Control for the First-order Inverted Pendulum Based on the Advantage Actor-critic Algorithm

Study of Q-learning and deep Q-network learning control for a rotary inverted pendulum system

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation