Log in

Reinforcement Learning DDPG–PPO Agent-Based Control System for Rotary Inverted Pendulum

  • Research Article-Mechanical Engineering
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

The rotary inverted pendulum (RIP) system is a nonlinear system used as a benchmark for testing control strategies. RIP system has a lot of applications in balancing of robotic systems such as drones and humanoid robots. Controlling RIP system is a complex task without concise knowledge of classic control engineering. This paper uses the reinforcement learning (RL) approach to control the RIP instead of classical controllers such as PID (proportional–integral–derivative) and LQR (linear–quadratic regulator). In this work, the deep deterministic policy gradient–proximal policy optimization (DDPG–PPO) agent is proposed and implemented to control the rotary inverted pendulum platform both in simulation and hardware. DDPG agent with 13 layers is trained for the swing-up action of the pendulum, and the mode selection process is trained and tested using the PPO agent. The rotary inverted pendulum is controlled using a proposed controller and compared with various RL agents such as soft actor critic–proximal policy optimization (SAC–PPO). Additionally, the proposed method is tested with a conventional proportional–integral–derivative (PID) controller, for different pendulum mass values, to validate its effectiveness. Finally, the proposed RL controller is implemented on the real-time RIP apparatus (Quanser Qube-Servo). Results show that DDPG–PPO RL agent is much effective than SAC–PPO agent during swing-up control.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig.2
Fig. 3
Fig.4
Fig.5
Fig.6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig12

Similar content being viewed by others

Abbreviations

CAD:

Computer-aided design

DAQ:

Data acquisition

DC:

Direct control

DDPG:

Deep deterministic policy gradient

DDQN:

Double deep Q-network

DQN:

Deep Q-network

LQR:

Linear–quadratic regulator

PID:

Proportional–integral–derivative

PPO:

Proximal policy optimization

RIP:

Rotary inverted pendulum

RL:

Reinforcement learning

SAC:

Soft actor critic

SMC:

Sliding mode controller

TF:

Transformation frame

References

  1. Younis, W.; Abdelati, M.: Design and implementation of an experimental segway model. In: AIP Conference Proceeding, pp. 350–354 (2009)

  2. Singh, R.; Bera, T.K.: Walking mechanism of quadruped robot on a side ramp using PI controller, In: IEEE Proceedings of the 15th International Conference on Industrial and Information Systems (ICIIS 2020), pp. 105–111 (2020)

  3. Aranda- Escola´stica, E.; Guinaldo, M.; Santos, M.: Control of a chain pendulum: a fuzzy logic approach. Int. J. Comput. Intell. Syst. 9(2), 281–295 (2016)

  4. Kajita, S. et al.:Biped walking stabilization based on linear inverted pendulum tracking, In: Proceeding of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4489–4496 (2010). https://doi.org/10.1109/IROS.2010.5651082

  5. Valluru, V.K.; Singh,M.; Singh, M.: Application of linear quadratic methods to stabilize cart inverted pendulum systems, In: proceeding of the 2nd IEEE International Conference on Power Electronics, Intelligent and Control Energy Systems (ICPEICES), pp. 1027–1031 (2018). https://doi.org/10.1109/ICPEICES.2018.8897316

  6. Chawla, I.; Singla, A.: Real-time stabilization control of a rotary inverted pendulum using LQR-based sliding mode controller. Arab. J. Sci. Eng. 46(3), 2589–2596 (2021). https://doi.org/10.1007/s13369-020-05161-7

    Article  Google Scholar 

  7. Bekkar, B.;and Ferkous, K.: Design of Online Fuzzy Tuning LQR Controller Applied to Rotary Single Inverted Pendulum: Experimental Validation. Arab J Sci Eng, 1–16 (2022).

  8. Mellatshahi, N.; Mozaffari, S.; Saif, M.; Alirezaee, S.: Inverted pendulum control with a robotic arm using deep reinforcement learning. In: IEEE International Symposium on Signals, Circuits and Systems (ISSCS), pp. 1–6 (2021)

  9. Sutton, R.S.; Barto, A.G.: Introduction to Reinforcement Learning. MIT Press, Cambridge (1998)

    Book  Google Scholar 

  10. Watkins, CJ.: Learning from delayed rewards. PhD thesis, University of Cambridge England, (1989)

  11. Abed-alguni, B.H.; Ottom, M.A.: Double delayed Q-learning. Int. J. Artif. Intell. 6(2), 41–59 (2018)

    Google Scholar 

  12. Abed-alguni, B.H.: Bat Q-learningalgorithm. Jordanian J. Comput. Inf. Technol. 3(1), 56–77 (2017)

    Google Scholar 

  13. **n, G.; Shi, L.; Long, G.; Pan, W.; Li, Y.; Xu, J.: Mobile robot path planning with reformative bat algorithm. Plos One, 1–12 (2022)

  14. Abed-Alguni, B.H.; Paul, D.J.; Chalup, S.K.; Henskens, F.A.: A comparison study of cooperative Q-learning algorithms for independent learners. Int. J. Artif. Intell. 14(1), 71–93 (2016)

    Google Scholar 

  15. Van, H.; Guez, A.; Silver, D.: Deep reinforcement learning with double Q-learning. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Palo Alto, AAAI Press, pp. 2094–2100 (2016)

  16. Dai, Y.; Lee, K.; Lee, S.: A real-time HIL control system on rotary inverted pendulum hardware platform based on double deep Q-network. Meas. Control 54(3–4), 417–428 (2021)

    Article  Google Scholar 

  17. Behrens, MR.; Ruder, WC.: Smart Magnetic Microrobots Learn to Swim with Deep Reinforcement Learning. ar**v preprint ar**v:2201.05599,(2022)

  18. Yu, X.; Fan, Y.; Xu, S.; Ou, L.: A self-adaptive SAC-PID control approach based on reinforcement learning for mobile robots. Int. J. Robust Nonlinear Control 10(2), 210–229 (2021)

    Google Scholar 

  19. Saeed, M.; Nagdi, M.; Rosman, B.; Ali, HH.: Deep reinforcement learning for robotic hand manipulation. In: IEEE Proceedings of the International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE), pp. 1–5 (2021)

  20. Gao, X.; Yan, L.; Wang, G.; Wang, T.; Du, N.; Gerada, C.: Toward obstacle avoidance for mobile robots using deep reinforcement learning algorithm. In: IEEE Proceedings of the 16th Conference on Industrial Electronics and Applications (ICIEA), pp. 2136–2139 (2021)

  21. Train Reinforcement Learning Agents to Control Quanser QUBE™ Pendulum MATLAB & Simulink (mathworks.com) (2022)

  22. Polzounov, K.; Redden, L.: Blue river controls: a toolkit for reinforcement learning control systems on hardware.ar**v:2001.02254, (2020)

  23. Kim, JB.; Kwon, DH.; Hong, YG.: Deep Q-network based rotary inverted pendulum system and its monitoring on the EdgeX platform. In: IEEE International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp.34–39 (2019)

  24. Cazzolato, M.; Benjamin, S.; Zebb, P.: On the dynamics of the furuta pendulum. J. Control Sci. Eng. 1–8 (2011)

  25. Koenig, S.; Simmons, R.G.: Complexity analysis of real-time reinforcement learning. In: Proceedings of the 11th National Conference on Artificial Intelligence (AAAI), pp. 99–105 (1993)

  26. Larsen, T.N.; Teigen, H.Ø.; Laache, T.; Varagnolo, D.; Rasheed, A.: Comparing deep reinforcement learning algorithms’ ability to safely navigate challenging waters. Front. Robot. AI, 1–19 (2021)

  27. Kathpal, A.; Singla, A.: SimMechanics™ based modeling, simulation and real-time control of Rotary Inverted Pendulum. In: IEEE Proceeding of the 11th International Conference on Intelligent Systems and Control (ISCO), pp. 166–172 (2017)

Download references

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shahpour Alirezaee.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhourji, R.S., Mozaffari, S. & Alirezaee, S. Reinforcement Learning DDPG–PPO Agent-Based Control System for Rotary Inverted Pendulum. Arab J Sci Eng 49, 1683–1696 (2024). https://doi.org/10.1007/s13369-023-07934-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-023-07934-2

Keywords

Navigation