Decomposition methods with deep corrections for reinforcement learning

Bouton, Maxime; Julian, Kyle D.; Nakhaei, Alireza; Fujimura, Kikuo; Kochenderfer, Mykel J.

doi:10.1007/s10458-019-09407-z

Decomposition methods with deep corrections for reinforcement learning

Published: 27 April 2019

Volume 33, pages 330–352, (2019)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Maxime Bouton ORCID: orcid.org/0000-0002-9151-1513¹,
Kyle D. Julian¹,
Alireza Nakhaei²,
Kikuo Fujimura² &
…
Mykel J. Kochenderfer¹

736 Accesses
4 Citations
4 Altmetric
Explore all metrics

Abstract

Decomposition methods have been proposed to approximate solutions to large sequential decision making problems. In contexts where an agent interacts with multiple entities, utility decomposition can be used to separate the global objective into local tasks considering each individual entity independently. An arbitrator is then responsible for combining the individual utilities and selecting an action in real time to solve the global problem. Although these techniques can perform well empirically, they rely on strong assumptions of independence between the local tasks and sacrifice the optimality of the global solution. This paper proposes an approach that improves upon such approximate solutions by learning a correction term represented by a neural network. We demonstrate this approach on a fisheries management problem where multiple boats must coordinate to maximize their catch over time as well as on a pedestrian avoidance problem for autonomous driving. In each problem, decomposition methods can scale to multiple boats or pedestrians by using strategies involving one entity. We verify empirically that the proposed correction method significantly improves the decomposition method and outperforms a policy trained on the full scale problem without utility decomposition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Pretty Darn Good Control: When are Approximate Solutions Better than Approximate Models

Article 04 September 2023

Using deep neural networks as a guide for modeling human planning

Article Open access 20 November 2023

Assessing Policy, Loss and Planning Combinations in Reinforcement Learning Using a New Modular Architecture

References

Kochenderfer, M. J. (2015). Decision making under uncertainty: Theory and application. Cambridge: MIT Press.
Book MATH Google Scholar
Russell, S. J., & Zimdars, A. (2003). Q-decomposition for reinforcement learning agents. In International conference on machine learning (ICML).
Tesauro, G. (2005). Online resource allocation using decompositional reinforcement learning. In AAAI conference on artificial intelligence (AAAI).
Bernstein, D. S., Zilberstein, S., & Immerman, N. (2000). The complexity of decentralized control of Markov decision processes. In Conference on uncertainty in artificial intelligence (UAI).
Rosenblatt, J. K. (2000). Optimal selection of uncertain actions by maximizing expected utility. Autonomous Robots, 9(1), 17–25.
Article Google Scholar
Chryssanthacopoulos, J. P., & Kochenderfer, M. J. (2012). Decomposition methods for optimized collision avoidance with multiple threats. AIAA Journal of Guidance, Control, and Dynamics, 35(2), 398–405.
Article Google Scholar
Ong, H. Y., & Kochenderfer, M. J. (2015). Short-term conflict resolution for unmanned aircraft traffic management. In Digital avionics systems conference (DASC).
Wray, K. H., Witwicki, S. J., & Zilberstein, S. (2017). Online decision-making for scalable autonomous systems. In International joint conference on artificial intelligence (IJCAI).
Hung, S., & Givigi, S. N. (2017). A q-learning approach to flocking with UAVs in a stochastic environment. IEEE Transactions on Cybernetics, 47(1), 186–197.
Article Google Scholar
Tan, M. (1993). Multi-agent reinforcement learning: Independent versus cooperative agents. In International conference on machine learning (ICML).
Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In AAAI conference on artificial intelligence (AAAI).
Tompa, R. E., & Kochenderfer, M. J. (2018). Optimal aircraft rerouting during space launches using adaptive spatial discretization. In Digital avionics systems conference (DASC).
Van der Pol, E., & Oliehoek, F. A. (2016). Coordinated deep reinforcement learners for traffic light control. In NIPS workshop on learning, inference and control of multi-agent systems.
Julian, K. D., & Kochenderfer, M. J. (2018). Autonomous distributed wildfire surveillance using deep reinforcement learning. In AIAA guidance, navigation, and control conference (GNC), decomposition methods with deep corrections for reinforcement learning 25.
Oliehoek, F. A., Whiteson, S., & Spaan, M. T. J. (2013). Approximate solutions for factored Dec-POMDPs with many agents. In International conference on autonomous agents and multiagent systems (AAMAS).
Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zambaldi, V. F., Jaderberg, M., et al. (2018). Value-decomposition networks for cooperative multi-agent learning based on team reward. In International conference on autonomous agents and multi-agent systems (AAMAS).
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533.
Article Google Scholar
Gu, S., Holly, E., Lillicrap, T. P., & Levine, S. (2017). Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In IEEE international conference on robotics and automation (ICRA).
Smart, W. D., & Kaelbling, L. P. (2002). Effective reinforcement learning for mobile robots. In IEEE international conference on robotics and automation (ICRA).
Gottwald, M., Meyer, D., Shen, H., & Diepold, K. (2017). Learning to walk with prior knowledge. In IEEE international conference on advanced intelligent mechatronics (AIM).
Cutler, M., Walsh, T. J., & How, J. P. (2015). Real-world reinforcement learning via multifidelity simulators. IEEE Transactions on Robotics, 31(3), 655–671.
Article Google Scholar
Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2016). Prioritized experience replay. In International conference on learning representations (ICLR).
Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double q-learning. In AAAI conference on artificial intelligence (AAAI).
Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M., & de Freitas, N. (2016). Dueling network architectures for deep reinforcement learning. In International conference on machine learning (ICML).
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning—An introduction. Cambridge: MIT Press.
Book MATH Google Scholar
Eldred, M., & Dunlavy, D. (2006). Formulations for surrogate-based optimization with data fit, multifidelity, and reduced-order models. In AIAA/ISSMO multi-disciplinary analysis and optimization conference.
Rajnarayan, D., Haas, A., & Kroo, I. (2008). A multifidelity gradient-free optimization method and application to aerodynamic design. In AIAA/ISSMO multidisciplinary analysis and optimization conference.
Egorov, M., Sunberg, Z. N., Balaban, E., Wheeler, T. A., Gupta, J. K., & Kochenderfer, M. J. (2017). POMDPs.jl: A framework for sequential decision making under uncertainty. Journal of Machine Learning, 18(26), 1–5.
MathSciNet MATH Google Scholar
Bai, H., Hsu, D., & Lee, W. S. (2014). Integrated perception and planning in the continuous space: A POMDP approach. International Journal of Robotics Research, 33(9), 1288–1302.
Article Google Scholar
Brechtel, S., Gindele, T., & Dillmann, R. (2014). Probabilistic decision-making under uncertainty for autonomous driving using continuous POMDPs. In IEEE international conference on intelligent transportation systems (ITSC).
Bandyopadhyay, T., Won, K. S., Frazzoli, E., Hsu, D., Lee, W. S., & Rus, D. (2012). Intention-aware motion planning. In Algorithmic foundations of robotics X.
Chae, H., Kang, C. M., Kim, B., Kim, J., Chung, C. C., & Choi, J. W. (2017) Autonomous braking system via deep reinforcement learning. In IEEE international conference on intelligent transportation systems (ITSC).
Chen, B., Zhao, D., & Peng, H. (2017). Evaluation of automated vehicles encountering pedestrians at unsignalized crossings. In IEEE intelligent vehicles symposium (IV).

Download references

Acknowledgements

Funding was provided by Honda Research Institute (US) (Grant No. 124232), National Science Foundation Graduate Research Fellowship Program (US) (Grant No. DGE-1656518).

Author information

Authors and Affiliations

Department of Aeronautics and Astronautics, Stanford University, Stanford, CA, USA
Maxime Bouton, Kyle D. Julian & Mykel J. Kochenderfer
Honda Research Institute, Mountain View, CA, USA
Alireza Nakhaei & Kikuo Fujimura

Authors

Maxime Bouton
View author publications
You can also search for this author in PubMed Google Scholar
Kyle D. Julian
View author publications
You can also search for this author in PubMed Google Scholar
Alireza Nakhaei
View author publications
You can also search for this author in PubMed Google Scholar
Kikuo Fujimura
View author publications
You can also search for this author in PubMed Google Scholar
Mykel J. Kochenderfer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maxime Bouton.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bouton, M., Julian, K.D., Nakhaei, A. et al. Decomposition methods with deep corrections for reinforcement learning. Auton Agent Multi-Agent Syst 33, 330–352 (2019). https://doi.org/10.1007/s10458-019-09407-z

Download citation

Published: 27 April 2019
Issue Date: 01 May 2019
DOI: https://doi.org/10.1007/s10458-019-09407-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Decomposition methods with deep corrections for reinforcement learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Pretty Darn Good Control: When are Approximate Solutions Better than Approximate Models

Using deep neural networks as a guide for modeling human planning

Assessing Policy, Loss and Planning Combinations in Reinforcement Learning Using a New Modular Architecture

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Decomposition methods with deep corrections for reinforcement learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Pretty Darn Good Control: When are Approximate Solutions Better than Approximate Models

Using deep neural networks as a guide for modeling human planning

Assessing Policy, Loss and Planning Combinations in Reinforcement Learning Using a New Modular Architecture

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation