Log in

Decomposition methods with deep corrections for reinforcement learning

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

Decomposition methods have been proposed to approximate solutions to large sequential decision making problems. In contexts where an agent interacts with multiple entities, utility decomposition can be used to separate the global objective into local tasks considering each individual entity independently. An arbitrator is then responsible for combining the individual utilities and selecting an action in real time to solve the global problem. Although these techniques can perform well empirically, they rely on strong assumptions of independence between the local tasks and sacrifice the optimality of the global solution. This paper proposes an approach that improves upon such approximate solutions by learning a correction term represented by a neural network. We demonstrate this approach on a fisheries management problem where multiple boats must coordinate to maximize their catch over time as well as on a pedestrian avoidance problem for autonomous driving. In each problem, decomposition methods can scale to multiple boats or pedestrians by using strategies involving one entity. We verify empirically that the proposed correction method significantly improves the decomposition method and outperforms a policy trained on the full scale problem without utility decomposition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Kochenderfer, M. J. (2015). Decision making under uncertainty: Theory and application. Cambridge: MIT Press.

    Book  MATH  Google Scholar 

  2. Russell, S. J., & Zimdars, A. (2003). Q-decomposition for reinforcement learning agents. In International conference on machine learning (ICML).

  3. Tesauro, G. (2005). Online resource allocation using decompositional reinforcement learning. In AAAI conference on artificial intelligence (AAAI).

  4. Bernstein, D. S., Zilberstein, S., & Immerman, N. (2000). The complexity of decentralized control of Markov decision processes. In Conference on uncertainty in artificial intelligence (UAI).

  5. Rosenblatt, J. K. (2000). Optimal selection of uncertain actions by maximizing expected utility. Autonomous Robots, 9(1), 17–25.

    Article  Google Scholar 

  6. Chryssanthacopoulos, J. P., & Kochenderfer, M. J. (2012). Decomposition methods for optimized collision avoidance with multiple threats. AIAA Journal of Guidance, Control, and Dynamics, 35(2), 398–405.

    Article  Google Scholar 

  7. Ong, H. Y., & Kochenderfer, M. J. (2015). Short-term conflict resolution for unmanned aircraft traffic management. In Digital avionics systems conference (DASC).

  8. Wray, K. H., Witwicki, S. J., & Zilberstein, S. (2017). Online decision-making for scalable autonomous systems. In International joint conference on artificial intelligence (IJCAI).

  9. Hung, S., & Givigi, S. N. (2017). A q-learning approach to flocking with UAVs in a stochastic environment. IEEE Transactions on Cybernetics, 47(1), 186–197.

    Article  Google Scholar 

  10. Tan, M. (1993). Multi-agent reinforcement learning: Independent versus cooperative agents. In International conference on machine learning (ICML).

  11. Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In AAAI conference on artificial intelligence (AAAI).

  12. Tompa, R. E., & Kochenderfer, M. J. (2018). Optimal aircraft rerouting during space launches using adaptive spatial discretization. In Digital avionics systems conference (DASC).

  13. Van der Pol, E., & Oliehoek, F. A. (2016). Coordinated deep reinforcement learners for traffic light control. In NIPS workshop on learning, inference and control of multi-agent systems.

  14. Julian, K. D., & Kochenderfer, M. J. (2018). Autonomous distributed wildfire surveillance using deep reinforcement learning. In AIAA guidance, navigation, and control conference (GNC), decomposition methods with deep corrections for reinforcement learning 25.

  15. Oliehoek, F. A., Whiteson, S., & Spaan, M. T. J. (2013). Approximate solutions for factored Dec-POMDPs with many agents. In International conference on autonomous agents and multiagent systems (AAMAS).

  16. Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zambaldi, V. F., Jaderberg, M., et al. (2018). Value-decomposition networks for cooperative multi-agent learning based on team reward. In International conference on autonomous agents and multi-agent systems (AAMAS).

  17. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533.

    Article  Google Scholar 

  18. Gu, S., Holly, E., Lillicrap, T. P., & Levine, S. (2017). Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In IEEE international conference on robotics and automation (ICRA).

  19. Smart, W. D., & Kaelbling, L. P. (2002). Effective reinforcement learning for mobile robots. In IEEE international conference on robotics and automation (ICRA).

  20. Gottwald, M., Meyer, D., Shen, H., & Diepold, K. (2017). Learning to walk with prior knowledge. In IEEE international conference on advanced intelligent mechatronics (AIM).

  21. Cutler, M., Walsh, T. J., & How, J. P. (2015). Real-world reinforcement learning via multifidelity simulators. IEEE Transactions on Robotics, 31(3), 655–671.

    Article  Google Scholar 

  22. Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2016). Prioritized experience replay. In International conference on learning representations (ICLR).

  23. Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double q-learning. In AAAI conference on artificial intelligence (AAAI).

  24. Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M., & de Freitas, N. (2016). Dueling network architectures for deep reinforcement learning. In International conference on machine learning (ICML).

  25. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning—An introduction. Cambridge: MIT Press.

    Book  MATH  Google Scholar 

  26. Eldred, M., & Dunlavy, D. (2006). Formulations for surrogate-based optimization with data fit, multifidelity, and reduced-order models. In AIAA/ISSMO multi-disciplinary analysis and optimization conference.

  27. Rajnarayan, D., Haas, A., & Kroo, I. (2008). A multifidelity gradient-free optimization method and application to aerodynamic design. In AIAA/ISSMO multidisciplinary analysis and optimization conference.

  28. Egorov, M., Sunberg, Z. N., Balaban, E., Wheeler, T. A., Gupta, J. K., & Kochenderfer, M. J. (2017). POMDPs.jl: A framework for sequential decision making under uncertainty. Journal of Machine Learning, 18(26), 1–5.

    MathSciNet  MATH  Google Scholar 

  29. Bai, H., Hsu, D., & Lee, W. S. (2014). Integrated perception and planning in the continuous space: A POMDP approach. International Journal of Robotics Research, 33(9), 1288–1302.

    Article  Google Scholar 

  30. Brechtel, S., Gindele, T., & Dillmann, R. (2014). Probabilistic decision-making under uncertainty for autonomous driving using continuous POMDPs. In IEEE international conference on intelligent transportation systems (ITSC).

  31. Bandyopadhyay, T., Won, K. S., Frazzoli, E., Hsu, D., Lee, W. S., & Rus, D. (2012). Intention-aware motion planning. In Algorithmic foundations of robotics X.

  32. Chae, H., Kang, C. M., Kim, B., Kim, J., Chung, C. C., & Choi, J. W. (2017) Autonomous braking system via deep reinforcement learning. In IEEE international conference on intelligent transportation systems (ITSC).

  33. Chen, B., Zhao, D., & Peng, H. (2017). Evaluation of automated vehicles encountering pedestrians at unsignalized crossings. In IEEE intelligent vehicles symposium (IV).

Download references

Acknowledgements

Funding was provided by Honda Research Institute (US) (Grant No. 124232), National Science Foundation Graduate Research Fellowship Program (US) (Grant No. DGE-1656518).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maxime Bouton.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bouton, M., Julian, K.D., Nakhaei, A. et al. Decomposition methods with deep corrections for reinforcement learning. Auton Agent Multi-Agent Syst 33, 330–352 (2019). https://doi.org/10.1007/s10458-019-09407-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10458-019-09407-z

Keywords

Navigation