Abstract
In this paper we use the Proximal Policy Optimization (PPO) deep reinforcement learning algorithm to train a Neural Network to control a four-legged robot in simulation. Reinforcement learning in general can learn complex behavior policies from simple state-reward tuples datasets and PPO in particular has proved its effectiveness in solving complex tasks with continuous states and actions. Moreover, since it is model-free, it is general and can adapt to changes in the environment or in the robot itself.
The virtual environment used to train the agent was modeled using our physics engine Project Chrono. Chrono can handle non smooth dynamics simulation allowing us to introduce stiff leg-ground contacts and using its Python interface Pychrono it can be interfaced with the Machine Leaning framework TensorFlow with ease. We trained the Neural Network until it learned to control the motor torques, then various policy Neural Network input state choices have been compared.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: Openai gym. CoRR, abs/1606.01540 (2016)
Heess, N., Dhruva, T.B., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y., Erez, T., Wang, Z., Eslami, S.M.A., Riedmiller, M.A., Silver, D.: Emergence of locomotion behaviours in rich environments. CoRR, abs/1707.02286 (2017)
LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient BackProp, pp. 9–48. Springer, Heidelberg (2012)
Levine, S., Pastor, P., Krizhevsky, A., Quillen, D.: Learning hand-eye coordination for robotic gras** with deep learning and large-scale data collection. CoRR, abs/1603.02199 (2016)
OpenAI., Andrychowicz, M., Baker, B., Chociej, M., Jozefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A., Schneider, J., Sidor, S., Tobin, J., Welinder, P., Weng, L., Zaremba, W.: Learning dexterous in-hand manipulation (2018)
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization. CoRR, abs/1502.05477 (2015)
Schulman, J., Moritz, P., Levine, S., Jordan, M.I., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. CoRR, abs/1506.02438 (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR, abs/1707.06347 (2017)
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, 1st edn. MIT Press, Cambridge (1998)
Tasora, A., Serban, R., Mazhar, H., Pazouki, A., Melanz, D., Fleischmann, J., Taylor, M., Sugiyama, H., Negrut, D.: Chrono: an open source multi-physics dynamics engine. In: HPCSE (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Benatti, S., Tasora, A., Mangoni, D. (2020). Training a Four Legged Robot via Deep Reinforcement Learning and Multibody Simulation. In: Kecskeméthy, A., Geu Flores, F. (eds) Multibody Dynamics 2019. ECCOMAS 2019. Computational Methods in Applied Sciences, vol 53. Springer, Cham. https://doi.org/10.1007/978-3-030-23132-3_47
Download citation
DOI: https://doi.org/10.1007/978-3-030-23132-3_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23131-6
Online ISBN: 978-3-030-23132-3
eBook Packages: EngineeringEngineering (R0)