A Linear Online Guided Policy Search Algorithm

Sun, Biao; **ong, Fangzhou; Liu, Zhiyong; Yang, Xu; Qiao, Hong

doi:10.1007/978-3-319-70139-4_44

Biao Sun^18,19,
Fangzhou **ong^19,20,
Zhiyong Liu^19,20,21,22,
Xu Yang¹⁹ &
…
Hong Qiao^{18,19,20,21,22}

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10638))

Included in the following conference series:

International Conference on Neural Information Processing

4709 Accesses
1 Citations

Abstract

In reinforcement learning (RL), the guided policy search (GPS), a variant of policy search method, can encode the policy directly as well as search for optimal solutions in the policy space. Even though this algorithm is provided with asymptotic local convergence guarantees, it can not work in a online way for conducting tasks in complex environments since it is trained with a batch manner which requires that all of the training samples should be given at the same time. In this paper, we propose an online version for GPS algorithm, which can learn policies incrementally without complete knowledge of initial positions for training. The experiments witness its efficacy on handling sequentially arriving training samples in a peg insertion task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

A Survey on Constraining Policy Updates Using the KL Divergence

Reinforcement Learning

Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes

References

Kalakrishnan, M., Righetti, L., Pastor, P., Schaal, S.: Learning force control policies for compliant robotic manipulation. In: Proceedings of the 29th International Conference on Machine Learning (2012)
Google Scholar
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(39), 1–40 (2016)
MathSciNet MATH Google Scholar
Endo, G., Morimoto, J., Matsubara, T., Nakanishi, J., Cheng, G.: Learning CPG-based biped locomotion with a policy gradient method: application to a humanoid robot. Int. J. Robot. Res. 27(2), 213–228 (2008)
Article Google Scholar
Deisenroth, M.P., Neumann, G., Peters, J., et al.: A survey on policy search for robotics. Found. Trends Robot. 2(1–2), 1–142 (2013)
Google Scholar
Levine, S., Koltun, V.: Guided policy search. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1–9 (2013)
Google Scholar
Levine, S., Abbeel, P.: Learning neural network policies with guided policy search under unknown dynamics. In: Advances in Neural Information Processing Systems, pp. 1071–1079 (2014)
Google Scholar
Levine, S., Koltun, V.: Variational policy search via trajectory optimization. In: Advances in Neural Information Processing Systems, pp. 207–215 (2013)
Google Scholar
Montgomery, W.H., Levine, S.: Guided policy search via approximate mirror descent. In: Advances in Neural Information Processing Systems, pp. 4008–4016 (2016)
Google Scholar
Sutton, R.S., Koop, A., Silver, D.: On the role of tracking in stationary environments. In: Proceedings of the 24th international conference on Machine learning, pp. 871–878 (2007)
Google Scholar
Ruvolo, P., Eaton, E.: ELLA: An efficient lifelong learning algorithm. In: Proceedings of the 30th International Conference on Machine Learning, pp. 507–515 (2013)
Google Scholar
Chebotar, Y., Kalakrishnan, M., Yahya, A., Li, A., Schaal, S., Levine, S.: Path integral guided policy search. In: International Conference on Robotics and Automation, pp. 3381–3388 (2017)
Google Scholar

Download references

Acknowledgments

This work is partly supported by NSFC grants 61375005, U1613213, 61702516, 61210009, MOST grants 2015BAK35B00, 2015BAK35B01, Guangdong Science and Technology Department grant 2016B090910001.

Author information

Authors and Affiliations

University of Science and Technology Bei**g, Bei**g, 100083, China
Biao Sun & Hong Qiao
The State Key Lab of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Science, Bei**g, 100190, China
Biao Sun, Fangzhou **ong, Zhiyong Liu, Xu Yang & Hong Qiao
School of Computer and Control, University of Chinese Academy of Sciences (UCAS), Bei**g, 100049, China
Fangzhou **ong, Zhiyong Liu & Hong Qiao
CAS Centre for Excellence in Brain Science and Intelligence Technology (CEBSIT), Shanghai, 200031, China
Zhiyong Liu & Hong Qiao
Cloud Computing Center, Chinese Academy of Sciences, DongGuan, 523808, Guandong, China
Zhiyong Liu & Hong Qiao

Authors

Biao Sun
View author publications
You can also search for this author in PubMed Google Scholar
Fangzhou **ong
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hong Qiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiyong Liu .

Editor information

Editors and Affiliations

Guangdong University of Technology, Guangzhou, China
Derong Liu
Guangdong University of Technology, Guangzhou, China
Shengli **e
South China University of Technology, Guangzhou, China
Yuanqing Li
Institute of Automation, Chinese Academy of Sciences, Bei**g, China
Dongbin Zhao
King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
El-Sayed M. El-Alfy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, B., **ong, F., Liu, Z., Yang, X., Qiao, H. (2017). A Linear Online Guided Policy Search Algorithm. In: Liu, D., **e, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10638. Springer, Cham. https://doi.org/10.1007/978-3-319-70139-4_44

Download citation

DOI: https://doi.org/10.1007/978-3-319-70139-4_44
Published: 29 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70138-7
Online ISBN: 978-3-319-70139-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Linear Online Guided Policy Search Algorithm

Abstract

Access this chapter

Similar content being viewed by others

A Survey on Constraining Policy Updates Using the KL Divergence

Reinforcement Learning

Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Linear Online Guided Policy Search Algorithm

Abstract

Access this chapter

Similar content being viewed by others

A Survey on Constraining Policy Updates Using the KL Divergence

Reinforcement Learning

Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation