Abstract
In UAV path planning tasks, classical algorithms can only solve the tasks where environment is knowable and can be changed into mathematical model. For the tasks with unknown environment information is difficult to solve. However, such tasks can be easily solved by human. In this paper, we first bulid a UAV path planning environment and train the agent with deep reinforcement learning as human demonstration. Then, we use inverse reinforcement learning algorithm and processed human demonstrations to learn human policy. Finally, we test the trained policy in path planning tasks to verify the feasibility.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische Mathematik 1, 269–271 (1959)
Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4, 100–107 (1968)
Stentz, A.: Optimal and efficient path planning for partially-known environments. In: Proceedings of the 1994 IEEE International Conference on Robotics and Automation, vol. 4, pp. 3310–3317 (1994)
LaValle, S.M.: Rapidly-exploring random trees : a new tool for path planning. Ann. Res. Rep. (1998)
Watkins, C.: Learning from delayed rewards (1989)
Konda, V.R., Tsitsiklis, J.N.: Onactor-critic algorithms. SIAM J. Control. Optim. 42, 1143–1166 (2003)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: ICML (2018)
Mnih, V., et al.: Playing atari with deep reinforcement learning. Ar**v ar**v:1312.5602 (2013)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. CoRR abs/1509.02971 (2016)
Yu, J., Dong, X., Li, Q., Ren, Z., Lv, J.: Cooperative guidance strategy for multiple hypersonic gliding vehicles system. Chin. J. Aeronaut. 33, 990–1005 (2020)
Russell, S.J.: Learning agents for uncertain environments (extended abstract). In: COLT’ 98 (1998)
Arora, S., Doshi, P.: A survey of inverse reinforcement learning: challenges, methods and progress. Artif. Intell. 297, 103500 (2021)
Ziebart, B.D., Maas, A.L., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: AAAI (2008)
Bagnell, J.A., Ziebart, B.D.: Modeling purposeful adaptive behavior with the principle of maximum causal entropy (2010)
Fu, J., Luo, K., Levine, S.: Learning robust rewards with adversarial inverse reinforcement learning. Ar**v ar**v:1710.11248 (2018)
Finn, C., Christiano, P.F., Abbeel, P., Levine, S.: A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models. Ar**v ar**v:1611.03852 (2016)
Ma, X., **a, L., Zhao, Q.: Air-combat strategy using deep q-learning. In: 2018 Chinese Automation Congress (CAC), pp. 3952–3957 (2018)
Goodfellow, I.J., et al.: Generative adversarial nets. In: NIPS (2014)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. Ar**v ar**v:1707.06347 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2015)
Acknowledgement
This work was supported by the Science and Technology Innovation 2030-Key Project of “New Generation Artificial Intelligence” under Grant 2020AAA0108200, the National Natural Science Foundation of China under Grants 62103023,61922008, 61973013, 61873011 and 62103016 the Innovation Zone Project under Grant 18-163-00-TS-001-001-34, the National Defense Project under 201-CXCY-A01-08-00-01, the Foundation Strengthening Program Technology Field Fund under Grant 2019-JCJQ-JJ-243, the Defense Industrial Technology Development Program under Grant JCKY2019601C106, the Young Elite Scientists Sponsorship Program by CAST under Grant 2021QNRC001, China National Postdoctoral Program for Innovative Talents under Grant BX20200034, and the China Postdoctoral Science Foundation under Grant 2020M680297, the Young Elite Scientists Sponsorship Program by CAST under Grant 2021QNRC001.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Xu, Z., Dong, J., Hua, Y., Dong, X., Li, Q., Ren, Z. (2023). UAV Path Planning from Human Demonstrations Using Inverse Reinforcement Learning. In: Yan, L., Duan, H., Deng, Y. (eds) Advances in Guidance, Navigation and Control. ICGNC 2022. Lecture Notes in Electrical Engineering, vol 845. Springer, Singapore. https://doi.org/10.1007/978-981-19-6613-2_535
Download citation
DOI: https://doi.org/10.1007/978-981-19-6613-2_535
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-6612-5
Online ISBN: 978-981-19-6613-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)