Log in

Optimal Strategy for Aircraft Pursuit-evasion Games via Self-play Iteration

  • Research Article
  • Published:
Machine Intelligence Research Aims and scope Submit manuscript

Abstract

In this paper, the pursuit-evasion game with state and control constraints is solved to achieve the Nash equilibrium of both the pursuer and the evader with an iterative self-play technique. Under the condition where the Hamiltonian formed by means of Pontryagin’s maximum principle has the unique solution, it can be proven that the iterative control law converges to the Nash equilibrium solution. However, the strong nonlinearity of the ordinary differential equations formulated by Pontryagin’s maximum principle makes the control policy difficult to figured out. Moreover the system dynamics employed in this manuscript contains a high dimensional state vector with constraints. In practical applications, such as the control of aircraft, the provided overload is limited. Therefore, in this paper, we consider the optimal strategy of pursuit-evasion games with constant constraint on the control, while some state vectors are restricted by the function of the input. To address the challenges, the optimal control problems are transformed into nonlinear programming problems through the direct collocation method. Finally, two numerical cases of the aircraft pursuit-evasion scenario are given to demonstrate the effectiveness of the presented method to obtain the optimal control of both the pursuer and the evader.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. R. Isaacs. Differential Gaines: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization, New York, USA: Dover Publications, 1999.

    Google Scholar 

  2. P. K. Chintagunta, V. R. Rao. Pricing strategies in a dynamic duopoly: A differential game model. Management Science, vol. 42, no. 11, pp. 1501–1514, 1996. DOI: https://doi.org/10.5555/2777472.2777473.

    Article  Google Scholar 

  3. L. A. Petrosyan, N. A. Zenkevich. Game Theory, Singapore: World Scientific Publishing Co Pte Ltd, 1996.

    Book  Google Scholar 

  4. Y. Mousavi, A. Zarei, A. Mousavi, M. Biari. Robust optimal higher-order-observer-based dynamic sliding mode control for VTOL unmanned aerial vehicles. International Journal of Automation and Computing, vol. 18, no. 5, pp. 802–813, 2021. DOI: https://doi.org/10.1007/s11633-021-1282-3.

    Article  Google Scholar 

  5. H. G. Zhang, Q. L. Wei, D. R. Liu. An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica, vol. 47, no. 1, pp. 207–214, 2011. DOI: https://doi.org/10.1016/j.automatica.2010.10.033.

    Article  MathSciNet  Google Scholar 

  6. N. Greenwood. A differential game in three dimensions: The aerial dogfight scenario. Dynamics and Control, vol. 2, no. 2, pp. 161–200, 1992. DOI: https://doi.org/10.1007/BF02169496.

    Article  MathSciNet  Google Scholar 

  7. K. Horie, B. A. Conway. Optimal fighter pursuit-evasion maneuvers found via two-sided optimization. Journal of Guidance, Control, and Dynamics, vol. 29, no. 1, pp. 105–112, 2006. DOI: https://doi.org/10.2514/1.3960.

    Article  Google Scholar 

  8. Z. Y. Li, H. Zhu, Z. Yang, Y. Z. Luo. A dimension-reduction solution of free-time differential games for spacecraft pursuit-evasion. Acta Astronautica, vol. 163, pp.201-210, 2019. DOI: https://doi.org/10.1016/j.actaastro.2019.01.011.

  9. J. F. Zhou, L. Zhao, H. Li, J. H. Cheng, S. Wang. Compensation control strategy for orbital pursuit-evasion problem with imperfect information. Applied Sciences, vol.11, no.4, Article number 1400, 2021. DOI: https://doi.org/10.3390/app11041400.

  10. M. Salimi, M. Ferrara. Differential game of optimal pursuit of one evader by many pursuers. International Journal of Game Theory, vol. 48, no. 2, pp. 481–490, 2019. DOI: https://doi.org/10.1007/s00182-018-0638-6.

    Article  MathSciNet  Google Scholar 

  11. V. G. Lopez, F. L. Lewis, Y. Wan, E. N. Sanchez, L. L. Fan. Solutions for multiagent pursuit-evasion games on communication graphs: Finite-time capture and asymptotic behaviors. IEEE Transactions on Automatic Control, vol. 65, no. 5, pp. 1911–1923, 2020. DOI: https://doi.org/10.1109/TAC.2019.2926554.

    Article  MathSciNet  Google Scholar 

  12. E. Garcia, D. W. Casbeer, A. von Moll, M. Pachter. Multiple pursuer multiple evader differential games. IEEE Transactions on Automatic Control, vol. 66, no. 5, pp. 2345–2350, 2021. DOI: https://doi.org/10.1109/TAC.2020.3003840.

    Article  MathSciNet  Google Scholar 

  13. D. W. Oyler. Contributions to Pursuit-Evasion Game Theory, Ph.D. dissertation, University of Michigan, USA, 2016.

    Google Scholar 

  14. D. Wang, M. M. Ha, M. M. Zhao. The intelligent critic framework for advanced optimal control. Artificial Intelh-gence Review, vol. 55, no. 1, pp. 1–22, 2022. DOI: https://doi.org/10.1007/s10462-021-10118-9.

    Article  Google Scholar 

  15. P. Soravia. Pursuit-evasion problems and viscosity solutions of isaacs equations. SIAM Journal on Control and Optimization, vol. 31, no. 3, pp. 604–623, 1993. DOI: https://doi.org/10.1137/0331027.

    Article  MathSciNet  Google Scholar 

  16. Q. L. Wei, D. R. Liu, Q. Lin, R. Z. Song. Adaptive dynamic programming for discrete-time zero-sum games. IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 4, pp. 957–969, 2018. DOI: https://doi.org/10.1109/TNNLS.2016.2638863.

    Article  Google Scholar 

  17. L. S. Pontryagin. Mathematical Theory of Optimal Processes, Boca Raton, USA: CRC Press, 1987.

    Google Scholar 

  18. R. W. Carr, R. G. Cobb, M. Pachter, S. Pierce. Solution of a pursuit-evasion game using a near-optimal strategy. Journal of Guidance, Control, and Dynamics, vol. 41, no. 4, pp. 841–850, 2018. DOI: https://doi.org/10.2514/1.G002911.

    Article  Google Scholar 

  19. M. Pontani, B. A. Conway. Numerical solution of the three-dimensional orbital pursuit-evasion game. Journal of Guidance, Control, and Dynamics, vol. 32, no. 2, pp. 474–487, 2009. DOI: https://doi.org/10.2514/1.37962.

    Article  Google Scholar 

  20. Y. L. Yang, K. G. Vamvoudakis, H. Modares. Safe reinforcement learning for dynamical games. International Journal of Robust and Nonlinear Control, vol. 30, no. 2, pp. 3706–3726, 2020. DOI: https://doi.org/10.1002/rnc.4962.

    Article  MathSciNet  Google Scholar 

  21. M. M. Ha, D. Wang, D. R. Liu. Discounted iterative adaptive critic designs with novel stability analysis for tracking control. IEEE/CAA Journal of Automatica Sanica, vol. 9, no. 7, pp. 1262–1272, 2022. DOI: https://doi.org/10.1109/JAS.2022.105692.

    Article  Google Scholar 

  22. Y. Yang, D. Ding, H. **ong, Y. Yin, D. Wunsch. Online barrier-actor-critic learning for H∞, control with full-state constraints and input saturation. Journal of the Franklin Institute, vol. 357, no. 7, pp. 3316–3344, 2020. DOI: https://doi.org/10.1016/j.jfranklin.2019.12.017.

    Article  MathSciNet  Google Scholar 

  23. Y. Kartal, K. Subbarao, A. Dogan, F. Lewis. Optimal game theoretic solution of the pursuit-evasion intercept problem using on-policy reinforcement learning. International Journal of Robust and Nonhnear Control, vol. 31, no. 16, pp. 7886–7903, 2021. DOI: https://doi.org/10.1002/rnc.5719.

    MathSciNet  Google Scholar 

  24. J. Selvakumar, E. Bakolas. Feedback strategies for a reach-avoid game with a single evader and multiple pursuers. IEEE Transactions on Cybernetics, vol. 51, no. 2, pp. 696–707, 2021. DOI: https://doi.org/10.1109/TCYB.2019.2914869.

    Article  Google Scholar 

  25. H. Xu. Finite-horizon near optimal design of nonhnear two-player zero-sum game in presence of completely unknown dynamics. Journal of Control, Automation and Electrical Systems, vol. 26, no. 4, pp. 361–370, 2015. DOI: https://doi.org/10.1007/s40313-015-0180-8.

    Article  Google Scholar 

  26. C. X. Mu, K. Wang, C. Y. Sun. Policy-iteration-based learning for nonlinear player game systems with constrained inputs. IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 51, no. 10, pp. 6488–6502, 2021. DOI: https://doi.org/10.1109/TSMC.2019.2962629.

    Article  Google Scholar 

  27. X. H. Cui, H. G. Zhang, Y. H. Luo, P. F. Zu. Online finite-horizon optimal learning algorithm for nonzero-sum games with partially unknown dynamics and constrained inputs. Neurocomputing, vol. 185, pp. 37–44, 2016. DOI: https://doi.org/10.1016/j.neucom.2015.12.021.

    Article  Google Scholar 

  28. I. E. Weintraub, M. Pachter, E. Garcia. An introduction to pursuit-evasion differential games. In Proceedings of American Control Conference, IEEE, Denver, USA, pp. 1049–1066, 2020. DOI: https://doi.org/10.23919/ACC45564.2020.9147205.

    Google Scholar 

  29. M. H. Breitner, H. J. Pesch, W. Grimm. Complex differential games of pursuit-evasion type with state constraints, Part 1: Necessary conditions for optimal open-loop strategies. Journal of Optimization Theory and Applications, vol. 78, no. 3, pp. 419–441, 1993. DOI: https://doi.org/10.1007/BF00939876.

    Article  MathSciNet  Google Scholar 

  30. A. Bressan. Noncooperative differential games. Milan Journal of Mathematics, vol. 79, pp. 357–427, 2011.

    Article  MathSciNet  Google Scholar 

  31. A. S. El-Bakry, R. A. Tapia, T. Tsuchiya, Y. Zhang. On the formulation and theory of the newton interior-point method for nonhnear programming. Journal of Optimization Theory and Applications, vol. 89, no. 3, pp. 507–541, 1996. DOI: https://doi.org/10.1007/BF02275347.

    Article  MathSciNet  Google Scholar 

  32. P. T. Boggs, J. W. Tolle. Sequential quadratic programming. Acta Numerica, vol. 4, pp. 1–51, 1995. DOI: https://doi.org/10.1017/S0962492900002518.

    Article  MathSciNet  Google Scholar 

  33. A. R. Conn, N. I. M. Gould, P. L. Toint. Trust-Region Methods, Philadelphia, USA: SIAM, 2000.

    Book  Google Scholar 

  34. F. Austin, G. Carbone, M. Falco, H. Hinz, M. Lewis. Automated maneuvering decisions for air-to-air combat. In Proceedings of Guidance, Navigation and Control Conference, Monterey, USA, pp. 659–669, 1987. DOI: https://doi.org/10.2514/6.1987-2393.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Zhang.

Ethics declarations

The authors declared that they have no conflicts of interest to this work.

Additional information

Colored figures are available in the online version at https://springer.longhoe.net/journal/11633

**n Wang received the B.Sc. degree in electronic information engineering from Zhengzhou University, China in 2012, and the M.Sc. degree in control engineering from University of Science and Technology Bei**g, China in 2015. He is currently a Ph.D. degree candidate in control theory and control engineering at State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences China, and University of Chinese Academy of Sciences, China.

His research interests include reinforcement learning, adaptive dynamic programming, optimal control and multi-agent system.

Qing-Lai Wei received the B.Sc. degree in automation, and the Ph.D. degree in control theory and control engineering from Northeastern University, China in 2002 and 2009, respectively. From 2009 to 2011, he was a postdoctoral fellow with State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, China. He is currently a professor of the institute and the associate director of the laboratory. He has authored four books, and published over 80 international journal papers. He is the Secretary of IEEE Computational Intelligence Society (CIS) Bei**g Chapter since 2015. He was Guest Editors for several international journals. He was a recipient of IEEE/CAA Journal of Automatica Sinica Best Paper Award, IEEE System, Man, and Cybernetics Society Andrew P. Sage Best Transactions Paper Award, IEEE Transactions on Neural Networks and Learning Systems Outstanding Paper Award, the Outstanding Paper Award of Acta Automatica Sinica, IEEE 6th Data Driven Control and Learning Systems Conference (DDCLS2017) Best Paper Award, and Zhang Siying Outstanding Paper Award of Chinese Control and Decision Conference (CCDC). He was a recipient of Shuang-Chuang Talents in Jiangsu Province, China, Young Researcher Award of Asia Pacific Neural Network Society (APNNS), Young Scientist Award and Yang Jiachi Tech Award of Chinese Association of Automation (CAA). He is a Board of Governors (BOG) member of the International Neural Network Society (INNS) and a council member of CAA.

His research interests include adaptive dynamic programming, neural-net works-based control, optimal control, nonlinear systems and their industrial applications.

Tao Li received the B.Sc. degree in automation from Northeastern University, China in 2019. He is currently a Ph. D. degree candidate in control theory and control engineering at State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, China.

His research interests include adaptive dynamic programming, reinforcement learning and approximate dynamic programming.

Jie Zhang received the B. Sc. degree in information and computing science from Tsinghua University, China in 2005, and the Ph.D. degree in technology of computer application from University of Chinese Academy of Sciences, China in 2015. He has been an associate professor with State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, China, since 2016.

His research interests include parallel control, mechanism design, optimal control and multiagent reinforcement learning.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Wei, QL., Li, T. et al. Optimal Strategy for Aircraft Pursuit-evasion Games via Self-play Iteration. Mach. Intell. Res. 21, 585–596 (2024). https://doi.org/10.1007/s11633-022-1413-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11633-022-1413-5

Keywords

Navigation