Hindsight-aware deep reinforcement learning algorithm for multi-agent systems

Li, Cheng**g; Wang, Li; Huang, Zirong

doi:10.1007/s13042-022-01505-x

Hindsight-aware deep reinforcement learning algorithm for multi-agent systems

Original Article
Published: 29 January 2022

Volume 13, pages 2045–2057, (2022)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Classic reinforcement learning algorithms generate experiences by the agent's constant trial and error, which leads to a large number of failure experiences stored in the replay buffer. As a result, the agents can only learn through these low-quality experiences. In the case of multi-agent systems, this problem is more serious. MADDPG (Multi-Agent Deep Deterministic Policy Gradient) has achieved significant results in solving multi-agent problems by using a framework of centralized training with decentralized execution. Nevertheless, the problem of too many failure experiences in the replay buffer has not been resolved. In this paper, we propose HMADDPG (Hindsight Multi-Agent Deep Deterministic Policy Gradient) to mitigate the negative impact of failure experience. HMADDPG has a hindsight unit, which allows the agents to reflect and produces pseudo experiences that tend to succeed. Pseudo experiences are stored in the replay buffer, so that the agents can combine two kinds of experiences to learn. We have evaluated our algorithm on a number of environments. The results show that the algorithm can guide agents to learn better strategies and can be applied in multi-agent systems which are cooperative, competitive, or mixed cooperative and competitive.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review of cooperative multi-agent deep reinforcement learning

Article 14 October 2022

Review on Dec-POMDP Model for MARL Algorithms

Policy Adaptive Multi-agent Deep Deterministic Policy Gradient

References

Luo F, Dong Z, Liang G, Murata J, Xu Z (2019) A distributed electricity trading system in active distribution networks based on multi-agent coalition and blockchain. IEEE Trans Power Syst 34:4097–4108
Article Google Scholar
Sallab AE, Abdou M, Perot E, Yogamani SK (2017) Deep reinforcement learning framework for autonomous driving. Electron Imaging 2017(19):70–76
Article Google Scholar
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap TP, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489
Article Google Scholar
Yang W, Wang X, Farhadi A, Gupta A, Mottaghi R (2019) Visual semantic navigation using scene priors. In: 7th international conference on learning representations
Wu C, Kreidieh A, Vinitsky E, Bayen AM (2017) Emergent behaviors in mixed-autonomy traffic. In: 1st annual conference on robot learning, vol 78, pp 398–407
Liu S, Lever G, Merel J, Tunyasuvunakool S, Heess N, Graepel T (2019) Emergent coordination through competition. In: 7th international conference on learning representations
Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: 30th conference on neural information processing systems. pp 6379–6390
Lin LJ (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8:293–321
Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller MA, Fidjeland A, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nat 518(7540):529–533
Article Google Scholar
Schaul T, Quan J, Antonoglou I, Silver D (2016) Prioritized experience replay. In: 4th international conference on learning representations
Horgan D, Quan J, Budden D, Barth-Maron G, Hessel M, van Hasselt H, Silver D (2018) Distributed prioritized experience replay. In: 6th international conference on learning representations
Luo J, Li H (2019) Dynamic experience replay
Andrychowicz M, Crow D, Ray A, Schneider J, Fong R, Welinder P, McGrew B, Tobin J, Abbeel P, Zaremba W (2017) Hindsight experience replay. In: 30th conference on neural information processing systems. pp 5048–5058
Fang M, Zhou T, Du Y, Han L, Zhang Z (2019) Curriculum-guided hindsight experience replay. In: 32nd conference on neural information processing systems. pp 12602–12613
Liu H, Trott A, Socher R, **ong C (2019) Competitive experience replay. In: 7th international conference on learning representations
Bai C, Liu P, Zhao W, Tang X (2019) Guided goal generation for hindsight multi-goal reinforcement learning. Neurocomputing 359:353–367
Article Google Scholar
Lai Y, Wang W, Yang Y, Zhu J, Kuang M (2020) Hindsight planner. In: AAMAS
Ren Z, Dong K, Zhou Y, Liu Q, Peng J (2019) Exploration via hindsight goal generation. In: NeurIPS
de Villiers B, Sabatta D (2020) Hindsight reward sha** in deep reinforcement learning. In: 2020 international SAUPEC/RobMech/PRASA conference, pp 1–7
Prianto E, Kim M, Park JH, Bae JH, Kim JS (2020) Path planning for multi-arm manipulators using deep reinforcement learning: soft actor-critic with hindsight experience replay. Sensors (Basel, Switzerland) 20:5911
Article Google Scholar
Zuo G, Zhao Q, Lu J, Li J (2020) Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards. Int J Adv Robot Syst 17
Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. In: Machine learning, proceedings of the eleventh international conference. pp 157–163
Tampuu A, Matiisen T, Kodelja D, Kuzovkin I, Korjus K, Aru J, Aru J, Vicente R (2017) Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE 12
Sukhbaatar S, Szlam A, Fergus R (2016) Learning multiagent communication with backpropagation. In: 29th conference on neural information processing systems. pp 2244–2252
Pesce E, Montana G (2020) Improving coordination in small-scale multi-agent deep reinforcement learning through memory-driven communication. Mach Learn 109(9–10):1727–1747
Article MathSciNet Google Scholar
Iqbal S, Sha F (2019) Actor-attention-critic for multi-agent reinforcement learning. In: Proceedings of the 36th international conference on machine learning, vol 97, pp. 2961–2970
Foerster JN, Farquhar G, Afouras T, Nardelli N, Whiteson S (2018) Counterfactual multi-agent policy gradients. In: Proceedings of the thirty-second AAAI conference on artificial intelligence, pp 2974–2982
Omidshafiei S, Pazis J, Amato C, How JP, Vian J (2017) Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Proceedings of the 34th international conference on machine learning, vol 70, pp 2681–2690
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: Bengio Y, LeCun Y (eds) 4th international conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, conference track proceedings

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Grant no. 61872260).

Author information

Authors and Affiliations

College of Data Science, Taiyuan University of Technology, **zhong, 030600, Shanxi, China
Cheng**g Li, Li Wang & Zirong Huang

Authors

Cheng**g Li
View author publications
You can also search for this author in PubMed Google Scholar
Li Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zirong Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Notation Table

See Table 1.

Table 1 Notation in this paper

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, C., Wang, L. & Huang, Z. Hindsight-aware deep reinforcement learning algorithm for multi-agent systems. Int. J. Mach. Learn. & Cyber. 13, 2045–2057 (2022). https://doi.org/10.1007/s13042-022-01505-x

Download citation

Received: 01 July 2021
Accepted: 06 January 2022
Published: 29 January 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s13042-022-01505-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hindsight-aware deep reinforcement learning algorithm for multi-agent systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A review of cooperative multi-agent deep reinforcement learning

Review on Dec-POMDP Model for MARL Algorithms

Policy Adaptive Multi-agent Deep Deterministic Policy Gradient

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 Notation Table

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Hindsight-aware deep reinforcement learning algorithm for multi-agent systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A review of cooperative multi-agent deep reinforcement learning

Review on Dec-POMDP Model for MARL Algorithms

Policy Adaptive Multi-agent Deep Deterministic Policy Gradient

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Notation Table

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation