Log in

Hindsight-aware deep reinforcement learning algorithm for multi-agent systems

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Classic reinforcement learning algorithms generate experiences by the agent's constant trial and error, which leads to a large number of failure experiences stored in the replay buffer. As a result, the agents can only learn through these low-quality experiences. In the case of multi-agent systems, this problem is more serious. MADDPG (Multi-Agent Deep Deterministic Policy Gradient) has achieved significant results in solving multi-agent problems by using a framework of centralized training with decentralized execution. Nevertheless, the problem of too many failure experiences in the replay buffer has not been resolved. In this paper, we propose HMADDPG (Hindsight Multi-Agent Deep Deterministic Policy Gradient) to mitigate the negative impact of failure experience. HMADDPG has a hindsight unit, which allows the agents to reflect and produces pseudo experiences that tend to succeed. Pseudo experiences are stored in the replay buffer, so that the agents can combine two kinds of experiences to learn. We have evaluated our algorithm on a number of environments. The results show that the algorithm can guide agents to learn better strategies and can be applied in multi-agent systems which are cooperative, competitive, or mixed cooperative and competitive.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Luo F, Dong Z, Liang G, Murata J, Xu Z (2019) A distributed electricity trading system in active distribution networks based on multi-agent coalition and blockchain. IEEE Trans Power Syst 34:4097–4108

    Article  Google Scholar 

  2. Sallab AE, Abdou M, Perot E, Yogamani SK (2017) Deep reinforcement learning framework for autonomous driving. Electron Imaging 2017(19):70–76

    Article  Google Scholar 

  3. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap TP, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489

    Article  Google Scholar 

  4. Yang W, Wang X, Farhadi A, Gupta A, Mottaghi R (2019) Visual semantic navigation using scene priors. In: 7th international conference on learning representations

  5. Wu C, Kreidieh A, Vinitsky E, Bayen AM (2017) Emergent behaviors in mixed-autonomy traffic. In: 1st annual conference on robot learning, vol 78, pp 398–407

  6. Liu S, Lever G, Merel J, Tunyasuvunakool S, Heess N, Graepel T (2019) Emergent coordination through competition. In: 7th international conference on learning representations

  7. Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: 30th conference on neural information processing systems. pp 6379–6390

  8. Lin LJ (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8:293–321

    Google Scholar 

  9. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller MA, Fidjeland A, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nat 518(7540):529–533

    Article  Google Scholar 

  10. Schaul T, Quan J, Antonoglou I, Silver D (2016) Prioritized experience replay. In: 4th international conference on learning representations

  11. Horgan D, Quan J, Budden D, Barth-Maron G, Hessel M, van Hasselt H, Silver D (2018) Distributed prioritized experience replay. In: 6th international conference on learning representations

  12. Luo J, Li H (2019) Dynamic experience replay

  13. Andrychowicz M, Crow D, Ray A, Schneider J, Fong R, Welinder P, McGrew B, Tobin J, Abbeel P, Zaremba W (2017) Hindsight experience replay. In: 30th conference on neural information processing systems. pp 5048–5058

  14. Fang M, Zhou T, Du Y, Han L, Zhang Z (2019) Curriculum-guided hindsight experience replay. In: 32nd conference on neural information processing systems. pp 12602–12613

  15. Liu H, Trott A, Socher R, **ong C (2019) Competitive experience replay. In: 7th international conference on learning representations

  16. Bai C, Liu P, Zhao W, Tang X (2019) Guided goal generation for hindsight multi-goal reinforcement learning. Neurocomputing 359:353–367

    Article  Google Scholar 

  17. Lai Y, Wang W, Yang Y, Zhu J, Kuang M (2020) Hindsight planner. In: AAMAS

  18. Ren Z, Dong K, Zhou Y, Liu Q, Peng J (2019) Exploration via hindsight goal generation. In: NeurIPS

  19. de Villiers B, Sabatta D (2020) Hindsight reward sha** in deep reinforcement learning. In: 2020 international SAUPEC/RobMech/PRASA conference, pp 1–7

  20. Prianto E, Kim M, Park JH, Bae JH, Kim JS (2020) Path planning for multi-arm manipulators using deep reinforcement learning: soft actor-critic with hindsight experience replay. Sensors (Basel, Switzerland) 20:5911

    Article  Google Scholar 

  21. Zuo G, Zhao Q, Lu J, Li J (2020) Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards. Int J Adv Robot Syst 17

  22. Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. In: Machine learning, proceedings of the eleventh international conference. pp 157–163

  23. Tampuu A, Matiisen T, Kodelja D, Kuzovkin I, Korjus K, Aru J, Aru J, Vicente R (2017) Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE 12

  24. Sukhbaatar S, Szlam A, Fergus R (2016) Learning multiagent communication with backpropagation. In: 29th conference on neural information processing systems. pp 2244–2252

  25. Pesce E, Montana G (2020) Improving coordination in small-scale multi-agent deep reinforcement learning through memory-driven communication. Mach Learn 109(9–10):1727–1747

    Article  MathSciNet  Google Scholar 

  26. Iqbal S, Sha F (2019) Actor-attention-critic for multi-agent reinforcement learning. In: Proceedings of the 36th international conference on machine learning, vol 97, pp. 2961–2970

  27. Foerster JN, Farquhar G, Afouras T, Nardelli N, Whiteson S (2018) Counterfactual multi-agent policy gradients. In: Proceedings of the thirty-second AAAI conference on artificial intelligence, pp 2974–2982

  28. Omidshafiei S, Pazis J, Amato C, How JP, Vian J (2017) Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Proceedings of the 34th international conference on machine learning, vol 70, pp 2681–2690

  29. Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: Bengio Y, LeCun Y (eds) 4th international conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, conference track proceedings

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Grant no. 61872260).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Notation Table

See Table 1.

Table 1 Notation  in this paper

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, C., Wang, L. & Huang, Z. Hindsight-aware deep reinforcement learning algorithm for multi-agent systems. Int. J. Mach. Learn. & Cyber. 13, 2045–2057 (2022). https://doi.org/10.1007/s13042-022-01505-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-022-01505-x

Keywords

Navigation