Abstract
The rotary inverted pendulum (RIP) system is a nonlinear system used as a benchmark for testing control strategies. RIP system has a lot of applications in balancing of robotic systems such as drones and humanoid robots. Controlling RIP system is a complex task without concise knowledge of classic control engineering. This paper uses the reinforcement learning (RL) approach to control the RIP instead of classical controllers such as PID (proportional–integral–derivative) and LQR (linear–quadratic regulator). In this work, the deep deterministic policy gradient–proximal policy optimization (DDPG–PPO) agent is proposed and implemented to control the rotary inverted pendulum platform both in simulation and hardware. DDPG agent with 13 layers is trained for the swing-up action of the pendulum, and the mode selection process is trained and tested using the PPO agent. The rotary inverted pendulum is controlled using a proposed controller and compared with various RL agents such as soft actor critic–proximal policy optimization (SAC–PPO). Additionally, the proposed method is tested with a conventional proportional–integral–derivative (PID) controller, for different pendulum mass values, to validate its effectiveness. Finally, the proposed RL controller is implemented on the real-time RIP apparatus (Quanser Qube-Servo). Results show that DDPG–PPO RL agent is much effective than SAC–PPO agent during swing-up control.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13369-023-07934-2/MediaObjects/13369_2023_7934_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13369-023-07934-2/MediaObjects/13369_2023_7934_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13369-023-07934-2/MediaObjects/13369_2023_7934_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13369-023-07934-2/MediaObjects/13369_2023_7934_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13369-023-07934-2/MediaObjects/13369_2023_7934_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13369-023-07934-2/MediaObjects/13369_2023_7934_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13369-023-07934-2/MediaObjects/13369_2023_7934_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13369-023-07934-2/MediaObjects/13369_2023_7934_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13369-023-07934-2/MediaObjects/13369_2023_7934_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13369-023-07934-2/MediaObjects/13369_2023_7934_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13369-023-07934-2/MediaObjects/13369_2023_7934_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13369-023-07934-2/MediaObjects/13369_2023_7934_Fig12_HTML.png)
Similar content being viewed by others
Abbreviations
- CAD:
-
Computer-aided design
- DAQ:
-
Data acquisition
- DC:
-
Direct control
- DDPG:
-
Deep deterministic policy gradient
- DDQN:
-
Double deep Q-network
- DQN:
-
Deep Q-network
- LQR:
-
Linear–quadratic regulator
- PID:
-
Proportional–integral–derivative
- PPO:
-
Proximal policy optimization
- RIP:
-
Rotary inverted pendulum
- RL:
-
Reinforcement learning
- SAC:
-
Soft actor critic
- SMC:
-
Sliding mode controller
- TF:
-
Transformation frame
References
Younis, W.; Abdelati, M.: Design and implementation of an experimental segway model. In: AIP Conference Proceeding, pp. 350–354 (2009)
Singh, R.; Bera, T.K.: Walking mechanism of quadruped robot on a side ramp using PI controller, In: IEEE Proceedings of the 15th International Conference on Industrial and Information Systems (ICIIS 2020), pp. 105–111 (2020)
Aranda- Escola´stica, E.; Guinaldo, M.; Santos, M.: Control of a chain pendulum: a fuzzy logic approach. Int. J. Comput. Intell. Syst. 9(2), 281–295 (2016)
Kajita, S. et al.:Biped walking stabilization based on linear inverted pendulum tracking, In: Proceeding of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4489–4496 (2010). https://doi.org/10.1109/IROS.2010.5651082
Valluru, V.K.; Singh,M.; Singh, M.: Application of linear quadratic methods to stabilize cart inverted pendulum systems, In: proceeding of the 2nd IEEE International Conference on Power Electronics, Intelligent and Control Energy Systems (ICPEICES), pp. 1027–1031 (2018). https://doi.org/10.1109/ICPEICES.2018.8897316
Chawla, I.; Singla, A.: Real-time stabilization control of a rotary inverted pendulum using LQR-based sliding mode controller. Arab. J. Sci. Eng. 46(3), 2589–2596 (2021). https://doi.org/10.1007/s13369-020-05161-7
Bekkar, B.;and Ferkous, K.: Design of Online Fuzzy Tuning LQR Controller Applied to Rotary Single Inverted Pendulum: Experimental Validation. Arab J Sci Eng, 1–16 (2022).
Mellatshahi, N.; Mozaffari, S.; Saif, M.; Alirezaee, S.: Inverted pendulum control with a robotic arm using deep reinforcement learning. In: IEEE International Symposium on Signals, Circuits and Systems (ISSCS), pp. 1–6 (2021)
Sutton, R.S.; Barto, A.G.: Introduction to Reinforcement Learning. MIT Press, Cambridge (1998)
Watkins, CJ.: Learning from delayed rewards. PhD thesis, University of Cambridge England, (1989)
Abed-alguni, B.H.; Ottom, M.A.: Double delayed Q-learning. Int. J. Artif. Intell. 6(2), 41–59 (2018)
Abed-alguni, B.H.: Bat Q-learningalgorithm. Jordanian J. Comput. Inf. Technol. 3(1), 56–77 (2017)
**n, G.; Shi, L.; Long, G.; Pan, W.; Li, Y.; Xu, J.: Mobile robot path planning with reformative bat algorithm. Plos One, 1–12 (2022)
Abed-Alguni, B.H.; Paul, D.J.; Chalup, S.K.; Henskens, F.A.: A comparison study of cooperative Q-learning algorithms for independent learners. Int. J. Artif. Intell. 14(1), 71–93 (2016)
Van, H.; Guez, A.; Silver, D.: Deep reinforcement learning with double Q-learning. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Palo Alto, AAAI Press, pp. 2094–2100 (2016)
Dai, Y.; Lee, K.; Lee, S.: A real-time HIL control system on rotary inverted pendulum hardware platform based on double deep Q-network. Meas. Control 54(3–4), 417–428 (2021)
Behrens, MR.; Ruder, WC.: Smart Magnetic Microrobots Learn to Swim with Deep Reinforcement Learning. ar**v preprint ar**v:2201.05599,(2022)
Yu, X.; Fan, Y.; Xu, S.; Ou, L.: A self-adaptive SAC-PID control approach based on reinforcement learning for mobile robots. Int. J. Robust Nonlinear Control 10(2), 210–229 (2021)
Saeed, M.; Nagdi, M.; Rosman, B.; Ali, HH.: Deep reinforcement learning for robotic hand manipulation. In: IEEE Proceedings of the International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE), pp. 1–5 (2021)
Gao, X.; Yan, L.; Wang, G.; Wang, T.; Du, N.; Gerada, C.: Toward obstacle avoidance for mobile robots using deep reinforcement learning algorithm. In: IEEE Proceedings of the 16th Conference on Industrial Electronics and Applications (ICIEA), pp. 2136–2139 (2021)
Train Reinforcement Learning Agents to Control Quanser QUBE™ Pendulum MATLAB & Simulink (mathworks.com) (2022)
Polzounov, K.; Redden, L.: Blue river controls: a toolkit for reinforcement learning control systems on hardware.ar**v:2001.02254, (2020)
Kim, JB.; Kwon, DH.; Hong, YG.: Deep Q-network based rotary inverted pendulum system and its monitoring on the EdgeX platform. In: IEEE International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp.34–39 (2019)
Cazzolato, M.; Benjamin, S.; Zebb, P.: On the dynamics of the furuta pendulum. J. Control Sci. Eng. 1–8 (2011)
Koenig, S.; Simmons, R.G.: Complexity analysis of real-time reinforcement learning. In: Proceedings of the 11th National Conference on Artificial Intelligence (AAAI), pp. 99–105 (1993)
Larsen, T.N.; Teigen, H.Ø.; Laache, T.; Varagnolo, D.; Rasheed, A.: Comparing deep reinforcement learning algorithms’ ability to safely navigate challenging waters. Front. Robot. AI, 1–19 (2021)
Kathpal, A.; Singla, A.: SimMechanics™ based modeling, simulation and real-time control of Rotary Inverted Pendulum. In: IEEE Proceeding of the 11th International Conference on Intelligent Systems and Control (ISCO), pp. 166–172 (2017)
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bhourji, R.S., Mozaffari, S. & Alirezaee, S. Reinforcement Learning DDPG–PPO Agent-Based Control System for Rotary Inverted Pendulum. Arab J Sci Eng 49, 1683–1696 (2024). https://doi.org/10.1007/s13369-023-07934-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-023-07934-2