Abstract
The model-free paradigm of Reinforcement learning (RL) is a theoretical strength. However in practice, the stringent assumptions required for optimal solutions (full state space exploration) and experimental issues, such as slow learning rates, render model-free RL a practical weakness. This paper addresses practical implementations of RL by interfacing elements of systems and control and robotics. In our approach space is handled by Sequential Composition (a technique commonly used in robotics) and time is handled by the use of passivity-based control methods (a standard nonlinear control approach) towards speeding up learning and providing a stop** time criteria. Sequential composition in effect partitions the state space and allows for the composition of controllers, each having different domains of attraction (DoA) and goal sets. This results in learning taking place in subsets of the state space. Passivity-based control (PBC) is a model-based control approach where total energy is computable. This total energy can be used as a candidate Lyapunov function to evaluate the stability of a controller and find estimates of its DoA. This enables learning in finite time: while learning the candidate Lyapunov function is monitored online to approximate the DoA of the learned controller. Once this DoA covers relevant states, from the point of view of sequential composition, the learning process is stopped. The result of this process is a collection of learned controllers that cover a desired range of the state space, and can be composed in sequence to achieve various desired goals. Optimality is lost in favour of practicality. Other implications include safety while learning and incremental learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Our definition of practicality is: the algorithm runs in finite-time with a finite-memory, and results in guaranteed stability/convergence assuming a known model of the system.
References
Bhatnagar S, Sutton RS, Ghavamzadeh M, Lee M (2009) Natural actor–critic algorithms. Automatica 45(11):2471–2482
Burridge RR, Rizzi AA, Koditschek DE (1999) Sequential composition of dynamically dexterous robot behaviors. Int J Robot Res 18(6):534–555
Busoniu L, Babuska R, De Schutter B, Ernst D (2010) Reinforcement learning and dynamic programming using function approximators. CRC Press
Chesi G (2004) Estimating the domain of attraction for uncertain polynomial systems. Automatica 40(11):1981–1986
Chesi G (2011) Domain of attraction: analysis and control via SOS programming. Springer
Chiang HD, Hirsch MW, Wu FF (1988) Stability regions of nonlinear autonomous dynamical systems. IEEE Trans Autom Control 33(1):16–27
Conner DC, Choset H, Rizzi AA (2009) Flow-through policies for hybrid controller synthesis applied to fully actuated systems. IEEE Trans Robot 25(1):136–146
Fujimoto K, Sugie T (2001) Canonical transformation and stabilization of generalized hamiltonian systems. Syst Control Lett 42(3):217–227
Grondman I, Busoniu L, Lopes GA, Babuska R (2012) A survey of actor–critic reinforcement learning: Standard and natural policy gradients. IEEE Trans Syst Man Cybern Part C: Appl Rev 42(6):1291–1307
Hachicho O (2007) A novel LMI-based optimization algorithm for the guaranteed estimation of the domain of attraction using rational Lyapunov functions. J Frankl Inst 344(5):535–552
Henrion D, Korda M (2013) Convex computation of the region of attraction of polynomial control systems. In: Proceedings of the european control conference, pp 676–681
Khalil HK (2002) Nonlinear systems, vol 3. Prentice hall
Konda VR, Tsitsiklis JN (2003) On actor–critic algorithms. SIAM j Control Optim 42(4):1143–1166
Konidaris G, Barreto AS (2009) Skill discovery in continuous reinforcement learning domains using skill chaining. In: Advances in neural information processing systems. pp 1015–1023
Le Ny J, Pappas GJ (2012) Sequential composition of robust controller specifications. In: Proceedings of the IEEE international conference on robotics and automation. pp 5190–5195
Lindemann SR, LaValle SM (2009) Simple and efficient algorithms for computing smooth, collision-free feedback laws over given cell decompositions. Int J Robot Res 28(5):600–621
Moore AW, Atkeson CG (1993) Prioritized swee**: Reinforcement learning with less data and less time. Mach Learn 13(1):103–130
Najafi E, Lopes GA, Babuska R (2013) Reinforcement learning for sequential composition control. In: Proceedings of the IEEE international conference on decision and control. pp 7265–7270
Najafi E, Lopes GA, Babuska R (2014a) Balancing a legged robot using state-dependent Riccati equation control. In: Proceedings of the 19th IFAC world congress, vol 19. pp 2177–2182
Najafi E, Lopes GA, Nageshrao SP, Babuska R (2014b) Rapid learning in sequential composition control. In: Proceedings of the IEEE international conference on decision and control
Ortega R (1998) Passivity-based control of Euler-Lagrange systems: mechanical, electrical and electromechanical applications. Springer
Ortega R, Garcia-Canseco E (2004) Interconnection and dam** assignment passivity-based control: a survey. Eur J Control 10(5):432–450
Ortega R, Van der Schaft AJ, Mareels I, Maschke B (2001) Putting energy back in control. IEEE Control Syst 21(2):18–33
Ortega R, Van Der Schaft A, Maschke B, Escobar G (2002) Interconnection and dam** assignment passivity-based control of port-controlled hamiltonian systems. Automatica 38(4):585–596
Ortega R, van der Schaft A, Castaños F, Astolfi A (2008) Control by interconnection and standard passivity-based control of port-hamiltonian systems. IEEE Trans Autom Control 53(11):2527–2542
Packard A, Topcu U, Seiler P, Balas G (2010) Help on SOS. IEEE Control Syst 30(4):18–23
Parrilo PA (2000) Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization. Ph.D thesis, California Institute of Technology
Sprangers O, Babuska R, Nageshrao S, Lopes G (2015) Reinforcement learning for Port-Hamiltonian systems. IEEE Trans Cybern 45(5):1003–1013
Sutton RS, Barto AG (1998) Introduction to reinforcement learning. MIT Press
Tedrake R, Manchester IR, Tobenkin M, Roberts JW (2010) LQR-trees: feedback motion planning via sums-of-squares verification. Int J Robot Res 29(8):1038–1052
van der Schaft A, Jeltsema D (2014) Port-hamiltonian systems theory: an introductory overview. Found Trends Syst Control 1(2–3):173–378
Vidyasagar M (2002) Nonlinear systems analysis, vol 42. SIAM
West DB, et al (2001) Introduction to graph theory, 2nd edn. Prentice hall
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Lopes, G.A.D., Najafi, E., Nageshrao, S.P., Babuška, R. (2015). Learning Complex Behaviors via Sequential Composition and Passivity-Based Control. In: Busoniu, L., Tamás, L. (eds) Handling Uncertainty and Networked Structure in Robot Control. Studies in Systems, Decision and Control, vol 42. Springer, Cham. https://doi.org/10.1007/978-3-319-26327-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-26327-4_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26325-0
Online ISBN: 978-3-319-26327-4
eBook Packages: EngineeringEngineering (R0)