Reinforcement Learning DDPG–PPO Agent-Based Control System for Rotary Inverted Pendulum

Bhourji, Rajmeet Singh; Mozaffari, Saeed; Alirezaee, Shahpour

doi:10.1007/s13369-023-07934-2

Reinforcement Learning DDPG–PPO Agent-Based Control System for Rotary Inverted Pendulum

Research Article-Mechanical Engineering
Published: 08 June 2023

Volume 49, pages 1683–1696, (2024)
Cite this article

Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Rajmeet Singh Bhourji¹,
Saeed Mozaffari¹ &
Shahpour Alirezaee ORCID: orcid.org/0000-0001-9015-772X^1,2

1071 Accesses
4 Citations
Explore all metrics

Abstract

The rotary inverted pendulum (RIP) system is a nonlinear system used as a benchmark for testing control strategies. RIP system has a lot of applications in balancing of robotic systems such as drones and humanoid robots. Controlling RIP system is a complex task without concise knowledge of classic control engineering. This paper uses the reinforcement learning (RL) approach to control the RIP instead of classical controllers such as PID (proportional–integral–derivative) and LQR (linear–quadratic regulator). In this work, the deep deterministic policy gradient–proximal policy optimization (DDPG–PPO) agent is proposed and implemented to control the rotary inverted pendulum platform both in simulation and hardware. DDPG agent with 13 layers is trained for the swing-up action of the pendulum, and the mode selection process is trained and tested using the PPO agent. The rotary inverted pendulum is controlled using a proposed controller and compared with various RL agents such as soft actor critic–proximal policy optimization (SAC–PPO). Additionally, the proposed method is tested with a conventional proportional–integral–derivative (PID) controller, for different pendulum mass values, to validate its effectiveness. Finally, the proposed RL controller is implemented on the real-time RIP apparatus (Quanser Qube-Servo). Results show that DDPG–PPO RL agent is much effective than SAC–PPO agent during swing-up control.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Study of Q-learning and deep Q-network learning control for a rotary inverted pendulum system

Article Open access 02 February 2024

A Review of Deep Reinforcement Learning Algorithms and Comparative Results on Inverted Pendulum System

A Reinforcement Learning Controller for the Swing-Up of the Furuta Pendulum

Abbreviations

CAD:: Computer-aided design
DAQ:: Data acquisition
DC:: Direct control
DDPG:: Deep deterministic policy gradient
DDQN:: Double deep Q-network
DQN:: Deep Q-network
LQR:: Linear–quadratic regulator
PID:: Proportional–integral–derivative
PPO:: Proximal policy optimization
RIP:: Rotary inverted pendulum
RL:: Reinforcement learning
SAC:: Soft actor critic
SMC:: Sliding mode controller
TF:: Transformation frame

References

Younis, W.; Abdelati, M.: Design and implementation of an experimental segway model. In: AIP Conference Proceeding, pp. 350–354 (2009)
Singh, R.; Bera, T.K.: Walking mechanism of quadruped robot on a side ramp using PI controller, In: IEEE Proceedings of the 15th International Conference on Industrial and Information Systems (ICIIS 2020), pp. 105–111 (2020)
Aranda- Escola´stica, E.; Guinaldo, M.; Santos, M.: Control of a chain pendulum: a fuzzy logic approach. Int. J. Comput. Intell. Syst. 9(2), 281–295 (2016)
Kajita, S. et al.:Biped walking stabilization based on linear inverted pendulum tracking, In: Proceeding of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4489–4496 (2010). https://doi.org/10.1109/IROS.2010.5651082
Valluru, V.K.; Singh,M.; Singh, M.: Application of linear quadratic methods to stabilize cart inverted pendulum systems, In: proceeding of the 2nd IEEE International Conference on Power Electronics, Intelligent and Control Energy Systems (ICPEICES), pp. 1027–1031 (2018). https://doi.org/10.1109/ICPEICES.2018.8897316
Chawla, I.; Singla, A.: Real-time stabilization control of a rotary inverted pendulum using LQR-based sliding mode controller. Arab. J. Sci. Eng. 46(3), 2589–2596 (2021). https://doi.org/10.1007/s13369-020-05161-7
Article Google Scholar
Bekkar, B.;and Ferkous, K.: Design of Online Fuzzy Tuning LQR Controller Applied to Rotary Single Inverted Pendulum: Experimental Validation. Arab J Sci Eng, 1–16 (2022).
Mellatshahi, N.; Mozaffari, S.; Saif, M.; Alirezaee, S.: Inverted pendulum control with a robotic arm using deep reinforcement learning. In: IEEE International Symposium on Signals, Circuits and Systems (ISSCS), pp. 1–6 (2021)
Sutton, R.S.; Barto, A.G.: Introduction to Reinforcement Learning. MIT Press, Cambridge (1998)
Book Google Scholar
Watkins, CJ.: Learning from delayed rewards. PhD thesis, University of Cambridge England, (1989)
Abed-alguni, B.H.; Ottom, M.A.: Double delayed Q-learning. Int. J. Artif. Intell. 6(2), 41–59 (2018)
Google Scholar
Abed-alguni, B.H.: Bat Q-learningalgorithm. Jordanian J. Comput. Inf. Technol. 3(1), 56–77 (2017)
Google Scholar
**n, G.; Shi, L.; Long, G.; Pan, W.; Li, Y.; Xu, J.: Mobile robot path planning with reformative bat algorithm. Plos One, 1–12 (2022)
Abed-Alguni, B.H.; Paul, D.J.; Chalup, S.K.; Henskens, F.A.: A comparison study of cooperative Q-learning algorithms for independent learners. Int. J. Artif. Intell. 14(1), 71–93 (2016)
Google Scholar
Van, H.; Guez, A.; Silver, D.: Deep reinforcement learning with double Q-learning. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Palo Alto, AAAI Press, pp. 2094–2100 (2016)
Dai, Y.; Lee, K.; Lee, S.: A real-time HIL control system on rotary inverted pendulum hardware platform based on double deep Q-network. Meas. Control 54(3–4), 417–428 (2021)
Article Google Scholar
Behrens, MR.; Ruder, WC.: Smart Magnetic Microrobots Learn to Swim with Deep Reinforcement Learning. ar**v preprint ar**v:2201.05599,(2022)
Yu, X.; Fan, Y.; Xu, S.; Ou, L.: A self-adaptive SAC-PID control approach based on reinforcement learning for mobile robots. Int. J. Robust Nonlinear Control 10(2), 210–229 (2021)
Google Scholar
Saeed, M.; Nagdi, M.; Rosman, B.; Ali, HH.: Deep reinforcement learning for robotic hand manipulation. In: IEEE Proceedings of the International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE), pp. 1–5 (2021)
Gao, X.; Yan, L.; Wang, G.; Wang, T.; Du, N.; Gerada, C.: Toward obstacle avoidance for mobile robots using deep reinforcement learning algorithm. In: IEEE Proceedings of the 16th Conference on Industrial Electronics and Applications (ICIEA), pp. 2136–2139 (2021)
Train Reinforcement Learning Agents to Control Quanser QUBE™ Pendulum MATLAB & Simulink (mathworks.com) (2022)
Polzounov, K.; Redden, L.: Blue river controls: a toolkit for reinforcement learning control systems on hardware.ar**v:2001.02254, (2020)
Kim, JB.; Kwon, DH.; Hong, YG.: Deep Q-network based rotary inverted pendulum system and its monitoring on the EdgeX platform. In: IEEE International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp.34–39 (2019)
Cazzolato, M.; Benjamin, S.; Zebb, P.: On the dynamics of the furuta pendulum. J. Control Sci. Eng. 1–8 (2011)
Koenig, S.; Simmons, R.G.: Complexity analysis of real-time reinforcement learning. In: Proceedings of the 11th National Conference on Artificial Intelligence (AAAI), pp. 99–105 (1993)
Larsen, T.N.; Teigen, H.Ø.; Laache, T.; Varagnolo, D.; Rasheed, A.: Comparing deep reinforcement learning algorithms’ ability to safely navigate challenging waters. Front. Robot. AI, 1–19 (2021)
Kathpal, A.; Singla, A.: SimMechanics™ based modeling, simulation and real-time control of Rotary Inverted Pendulum. In: IEEE Proceeding of the 11th International Conference on Intelligent Systems and Control (ISCO), pp. 166–172 (2017)

Download references

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Author information

Authors and Affiliations

Mechanical, Automotive, and Material Engineering Department, University of Windsor, Windsor, ON, Canada
Rajmeet Singh Bhourji, Saeed Mozaffari & Shahpour Alirezaee
Faculty of Engineering, University of Windsor, Windsor, Canada
Shahpour Alirezaee

Authors

Rajmeet Singh Bhourji
View author publications
You can also search for this author in PubMed Google Scholar
Saeed Mozaffari
View author publications
You can also search for this author in PubMed Google Scholar
Shahpour Alirezaee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shahpour Alirezaee.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bhourji, R.S., Mozaffari, S. & Alirezaee, S. Reinforcement Learning DDPG–PPO Agent-Based Control System for Rotary Inverted Pendulum. Arab J Sci Eng 49, 1683–1696 (2024). https://doi.org/10.1007/s13369-023-07934-2

Download citation

Received: 08 September 2022
Accepted: 08 May 2023
Published: 08 June 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s13369-023-07934-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement Learning DDPG–PPO Agent-Based Control System for Rotary Inverted Pendulum

Abstract

Access this article

Similar content being viewed by others

Study of Q-learning and deep Q-network learning control for a rotary inverted pendulum system

A Review of Deep Reinforcement Learning Algorithms and Comparative Results on Inverted Pendulum System

A Reinforcement Learning Controller for the Swing-Up of the Furuta Pendulum

Abbreviations

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Reinforcement Learning DDPG–PPO Agent-Based Control System for Rotary Inverted Pendulum

Abstract

Access this article

Similar content being viewed by others

Study of Q-learning and deep Q-network learning control for a rotary inverted pendulum system

A Review of Deep Reinforcement Learning Algorithms and Comparative Results on Inverted Pendulum System

A Reinforcement Learning Controller for the Swing-Up of the Furuta Pendulum

Abbreviations

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation