Log in

Sparse online kernelized actor-critic Learning in reproducing kernel Hilbert space

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

In this paper, we develop a novel non-parametric online actor-critic reinforcement learning (RL) algorithm to solve optimal regulation problems for a class of continuous-time affine nonlinear dynamical systems. To deal with the value function approximation (VFA) with inherent nonlinear and unknown structure, a reproducing kernel Hilbert space (RKHS)-based kernelized method is designed through online sparsification, where the dictionary size is fixed and consists of updated elements. In addition, the linear independence check condition, i.e., an online criteria, is designed to determine whether the online data should be inserted into the dictionary. The RHKS-based kernelized VFA has a variable structure in accordance with the online data collection, which is different from classical parametric VFA methods with a fixed structure. Furthermore, we develop a sparse online kernelized actor-critic learning RL method to learn the unknown optimal value function and the optimal control policy in an adaptive fashion. The convergence of the presented kernelized actor-critic learning method to the optimum is provided. The boundedness of the closed-loop signals during the online learning phase can be guaranteed. Finally, a simulation example is conducted to demonstrate the effectiveness of the presented kernelized actor-critic learning algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  • Anderson BD, Moore JB (2007) Optimal control: linear quadratic methods. Courier Corporation, HH

    Google Scholar 

  • Bellman R (1966) Dynamic programming. Science 153(3731):34–37

    MATH  Google Scholar 

  • Berlinet A, Thomas-Agnan C (2011) Reproducing kernel Hilbert spaces in probability and statistics. Springer Science & Business Media, Berlin

    MATH  Google Scholar 

  • Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin

    MATH  Google Scholar 

  • Boyd S, Boyd SP, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Cortes C, Vapnik V (1995) Support-vector networks. Mach learn 20(3):273–297

    MATH  Google Scholar 

  • Hall P, Park BU, Samworth RJ et al (2008) Choice of neighbor order in nearest-neighbor classification. Annals Stat 36(5):2135–2152

    MathSciNet  MATH  Google Scholar 

  • Härdle W, Simar L (2015) “Applied multivariate statistical analysis course,”

  • Haykin SS (2009) “Neural networks and learning machines,”

  • Ioannou PA, Sun J (2012) Robust adaptive control. Courier Corporation, HH

    MATH  Google Scholar 

  • Jiang Y, Jiang Z-P (2015) Global adaptive dynamic programming for continuous-time nonlinear systems. IEEE Trans Autom Control 60(11):2917–2929

    MathSciNet  MATH  Google Scholar 

  • Khalil HK, Grizzle JW (2002) Nonlinear systems. Prentice hall Upper Saddle River, NJ

    Google Scholar 

  • Kingravi HA, Chowdhary G, Vela PA, Johnson EN (2012) Reproducing kernel Hilbert space approach for the online update of radial bases in neuro-adaptive control. IEEE Trans Neural Netw Learn Syst 23(7):1130–1141

    Google Scholar 

  • Krstic M, Kokotovic PV, Kanellakopoulos I (1995) Nonlinear and adaptive control design. John Wiley & Sons Inc, Hoboken

    MATH  Google Scholar 

  • Kung SY (2014) Kernel methods and machine learning. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Lewis FL, Liu D (2013) Reinforcement learning and approximate dynamic programming for feedback control. John Wiley & Sons, Hoboken

    Google Scholar 

  • Lewis FL, Yesildirak A, Jagannathan S (1998) Neural network control of robot manipulators and nonlinear systems. Taylor & Francis Inc, USA

    Google Scholar 

  • Lewis FL, Vrabie D, Syrmos VL (2012) Optimal control. John Wiley & Sons, Hoboken

    MATH  Google Scholar 

  • Liang M, Wang D, Liu D (2020) Neuro-optimal control for discrete stochastic processes via a novel policy iteration algorithm. IEEE Trans Syst Man Cybern Syst 50(11):3972–3985

    Google Scholar 

  • Liberzon D (2011) Calculus of variations and optimal control theory. Princeton university press, Princeton, NJ, USA

    MATH  Google Scholar 

  • Lin H, Zhao B, Liu D, Alippi C (2020) Data-based fault tolerant control for affine nonlinear systems through particle swarm optimized neural networks. IEEE/CAA J Automatica Sinica 7(4):954–964

    MathSciNet  Google Scholar 

  • Liu D, Wei Q (2014) Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst 25(3):621–634

    Google Scholar 

  • Liu D, Wang D, Wang F, Li H, Yang X (2014) Neural-network-based online hjb solution for optimal robust guaranteed cost control of continuous-time uncertain nonlinear systems. IEEE Trans Cybern 44(12):2834–2847

    Google Scholar 

  • Liu D, Xue S, Zhao B, Luo B, Wei Q (2021) Adaptive dynamic programming for control: a survey and recent advances. IEEE Trans Syst Man Cybern Syst 51(1):142–160

    Google Scholar 

  • Nguyen-Tuong D, Schölkopf B, Peters J (2009) “Sparse online model learning for robot control with support vector regression,” in 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp. 3121–3126

  • Poggio T, Girosi F (1990) Networks for approximation and learning. Proceedings of the IEEE 78(9):1481–1497

  • Prajna S, Papachristodoulou A, Parrilo PA (2002) “Introducing sostools: A general purpose sum of squares programming solver,” in Proceedings of the 41st IEEE Conference on Decision and Control, 2002., vol. 1. IEEE, pp. 741–746

  • Russell S, Norvig P (2002) Artificial intelligence: a modern approach. Prentice Hall, Upper Saddle River, New Jersey, USA

    MATH  Google Scholar 

  • Sarangapani J (2018) Neural network control of nonlinear discrete-time systems. CRC Press, Cambridge

    Google Scholar 

  • Sepulchre R, Jankovic M, Kokotovic PV (2012) Constructive nonlinear control. Springer Science & Business Media, HH

    MATH  Google Scholar 

  • Speyer JL, Jacobson DH (2010) Primer on optimal control theory. SIAM

  • Tao G, Kokotovic PV (1996) Adaptive control of systems with actuator and sensor nonlinearities. John Wiley & Sons Inc, Hoboken

    MATH  Google Scholar 

  • Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888

    MathSciNet  MATH  Google Scholar 

  • Vamvoudakis KG, Lewis FL, Hudas GR (2012) Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality. Automatica 48(8):1598–1611

    MathSciNet  MATH  Google Scholar 

  • Wang F-Y, Zhang H, Liu D (2009) Adaptive dynamic programming: an introduction. IEEE Comput Intell Magazine 4(2):39–47

    Google Scholar 

  • Wang M, Ge SS, Hong K (2010) Approximation-based adaptive tracking control of pure-feedback nonlinear systems with multiple unknown time-varying delays. IEEE Trans Neural Netw 21(11):1804–1816

    Google Scholar 

  • Witten IH, Frank E (2002) Data mining: practical machine learning tools and techniques with java implementations. Acm Sigmod Record 31(1):76–77

    Google Scholar 

  • Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Philip SY et al (2008) Top 10 algorithms in data mining. Knowledge Inform Syst 14(1):1–37

    Google Scholar 

  • Xue S, Luo B, Liu D (2020a) Event-triggered adaptive dynamic programming for zero-sum game of partially unknown continuous-time nonlinear systems. IEEE Trans Syst Man Cybern Syst 50(9):3189–3199

    Google Scholar 

  • Xue S, Luo B, Liu D, Yang Y (2020b) Constrained event-triggered H\(_\infty \) control based on adaptive dynamic programming with concurrent learning. In: IEEE Transactions on Systems, Man, and Cybernetics: Systems. https://doi.org/10.1109/TSMC.2020.2997559

  • Yang Y, Xu C-Z (2020) Adaptive fuzzy leader-follower synchronization of constrained heterogeneous multiagent systems. IEEE Trans Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2020.3021714

    Article  Google Scholar 

  • Yang Y, Wunsch D, Yin Y (2017) Hamiltonian-driven adaptive dynamic programming for continuous nonlinear dynamical systems. IEEE Trans Neural Netw Learn Syst 28(8):1929–1940

    MathSciNet  Google Scholar 

  • Yang Y, Modares H, Wunsch DC, Yin Y (2018) Leader-follower output synchronization of linear heterogeneous systems with active leader using reinforcement learning. IEEE Trans Neural Netw Learn Syst 29(6):2139–2153

    MathSciNet  Google Scholar 

  • Yang Y, Modares H, Wunsch DC, Yin Y (2019) Optimal containment control of unknown heterogeneous systems with active leaders. IEEE Trans Control Syst Technol 27(3):1228–1236

    Google Scholar 

  • Yang Y, Ding D-W, **ong H, Yin Y, Wunsch DC (2020a) Online barrier-actor-critic learning for H\(_\infty \) control with full-state constraints and input saturation. J Franklin Inst 357(6):3316–3344

    MathSciNet  MATH  Google Scholar 

  • Yang Y, Vamvoudakis KG, Modares H, Yin Y, Wunsch DC (2020b) Safe intermittent reinforcement learning with static and dynamic event generators. IEEE Trans Neural Netw Learn Syst 31(12):5441–5455

    MathSciNet  Google Scholar 

  • Yang Y, Vamvoudakis KG, Modares H, Yin Y, Wunsch DC (2020c) Hamiltonian-driven hybrid adaptive dynamic programming. IEEE Trans Syst Man Cybern Syst. https://doi.org/10.1109/TSMC.2019.2962103

    Article  Google Scholar 

  • Yang Y, Gao W, Modares H, Xu C-Z (2021a) Robust actor-critic learning for continuous-time nonlinear systems with unmodeled dynamics. IEEE Trans Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2021.3075501

    Article  Google Scholar 

  • Yang Y, Mazouchi M, Modares H (2021b) Hamiltonian-driven adaptive dynamic programming for mixed H\(_2\)/H\(_\infty \) performance using sum-of-squares. Int J Robust Nonlinear Control 31(6):1941–1963

    Google Scholar 

  • Zhao B, Liu D (2020) Event-triggered decentralized tracking control of modular reconfigurable robots through adaptive dynamic programming. IEEE Trans Indus Electron 67(4):3054–3064

    Google Scholar 

  • Zhao B, Liu D, Li Y (2017) Observer based adaptive dynamic programming for fault tolerant control of a class of nonlinear systems. Inform Sci 384:21–33

    MATH  Google Scholar 

  • Zhao B, Wang D, Shi G, Liu D, Li Y (2018) Decentralized control for large-scale nonlinear systems with unknown mismatched interconnections via policy iteration. IEEE Trans Syst Man Cybern Syst 48(10):1725–1735

    Google Scholar 

  • Zhao B, Liu D, Luo C (2020) Reinforcement learning-based optimal stabilization for unknown nonlinear systems subject to inputs with uncertain constraints. IEEE Trans Neural Netw Learn Syst 31(10):4330–4340

    MathSciNet  Google Scholar 

  • Zhang H, Liu D (2006) Fuzzy modeling and fuzzy control. Springer, Berlin

    MATH  Google Scholar 

  • Zhang Q, Zhao D (2019) Data-based reinforcement learning for nonzero-sum games with unknown drift dynamics. IEEE Trans Cybern 49(8):2874–2885

    Google Scholar 

  • Zhang Q, Zhao D, Zhu Y (2017) Event-triggered \({H}_\infty \) control for continuous-time nonlinear system via concurrent learning. IEEE Trans Syst Man Cybern Syst 47(7):1071–1081

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Zhao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by the National Natural Science Foundation of China under Grants 61903028, 61973330, 61803371 and 61773075, in part by the Bei**g Natural Science Foundation under Grant 4212038, in part by the Open Research Project of the State Key Laboratory of Management and Control for Complex Systems, Institute of Sciences under Grant 20210108, in part by the Open Research Project of the State Key Laboratory of Industrial Control Technology, Zhejiang University, China under Grant ICT2021B48, in part by the Fundamental Research Funds for the Central Universities under Grant 2019NTST25, and in part by the State Key Laboratory of Synthetical Automation for Process Industries under Grant 2019-KF-23-03.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, Y., Zhu, H., Zhang, Q. et al. Sparse online kernelized actor-critic Learning in reproducing kernel Hilbert space. Artif Intell Rev 55, 23–58 (2022). https://doi.org/10.1007/s10462-021-10045-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-021-10045-9

Keywords

Navigation