Distributed Deep Reinforcement Learning: A Survey and a Multi-player Multi-agent Learning Toolbox

Yin, Qiyue; Yu, Tongtong; Shen, Shengqi; Yang, Jun; Zhao, Mei**g; Ni, Wancheng; Huang, Kaiqi; Liang, Bin; Wang, Liang

doi:10.1007/s11633-023-1454-4

Distributed Deep Reinforcement Learning: A Survey and a Multi-player Multi-agent Learning Toolbox

Review
Open access
Published: 11 January 2024

Volume 21, pages 411–430, (2024)
Cite this article

Download PDF

You have full access to this open access article

Machine Intelligence Research Aims and scope Submit manuscript

Distributed Deep Reinforcement Learning: A Survey and a Multi-player Multi-agent Learning Toolbox

Download PDF

Qiyue Yin ORCID: orcid.org/0000-0002-3442-6275^1,2,
Tongtong Yu¹,
Shengqi Shen¹,
Jun Yang³,
Mei**g Zhao¹,
Wancheng Ni^1,2,
Kaiqi Huang^1,2,4,
Bin Liang³ &
…
Liang Wang^1,2,4

3075 Accesses
2 Citations
2 Altmetric
Explore all metrics

Abstract

With the breakthrough of AlphaGo, deep reinforcement learning has become a recognized technique for solving sequential decision-making problems. Despite its reputation, data inefficiency caused by its trial and error learning mechanism makes deep reinforcement learning difficult to apply in a wide range of areas. Many methods have been developed for sample efficient deep reinforcement learning, such as environment modelling, experience transfer, and distributed modifications, among which distributed deep reinforcement learning has shown its potential in various applications, such as human-computer gaming and intelligent transportation. In this paper, we conclude the state of this exciting field, by comparing the classical distributed deep reinforcement learning methods and studying important components to achieve efficient distributed learning, covering single player single agent distributed deep reinforcement learning to the most complex multiple players multiple agents distributed deep reinforcement learning. Furthermore, we review recently released toolboxes that help to realize distributed deep reinforcement learning without many modifications of their non-distributed versions. By analysing their strengths and weaknesses, a multi-player multi-agent distributed deep reinforcement learning toolbox is developed and released, which is further validated on Wargame, a complex environment, showing the usability of the proposed toolbox for multiple players and multiple agents distributed deep reinforcement learning under complex games. Finally, we try to point out challenges and future trends, ho** that this brief review can provide a guide or a spark for researchers who are interested in distributed deep reinforcement learning.

Article PDF

A survey on multi-agent deep reinforcement learning: from the perspective of challenges and applications

Article 24 November 2020

Deep Reinforcement Learning Techniques in Diversified Domains: A Survey

Article 10 February 2021

Challenges of Reinforcement Learning

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis. Mastering the game of go with deep neural networks and tree search. Nature, vol. 529, no. 7587, pp. 484–489, 2016. DOI:https://doi.org/10.1038/nature16961.
Article Google Scholar
D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. T. Chen, T. Lillicrap, F. Hui, L. Sifre, G. Van Den driessche, T. Graepel, D. Hassabis. Mastering the game of go without human knowledge. Nature, vol. 550, no. 7676, pp. 354–359, 2017. DOI: https://doi.org/10.1038/nature24270.
Article Google Scholar
Y. Yu. Towards sample efficient reinforcement learning. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, pp. 5739–5743, 2018. DOI: https://doi.org/10.24963/ijcai.2018/820.
X. P. Qiu, T. X. Sun, Y. G. Xu, Y. F. Shao, N. Dai, X. J. Huang. Pre-trained models for natural language processing: A survey. Science China Technological Sciences, vol. 63, no. 10, pp. 1872–1897, 2020. DOI: https://doi.org/10.1007/s11431-020-1647-3.
Article Google Scholar
J. J. Li, S. Koyamada, Q. W. Ye, G. Q. Liu, C. Wang, R. H. Yang, L. Zhao, T. Qin, T. Y. Liu, H. W. Hon. Suphx: Mastering mahjong with deep reinforcement learning, [Online], Available: https://arxiv.org/abs/2003.13590, 2020.
C. Berner, G. Brockman, B. Chan, V. Cheung, P. Dębiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse, R. Józefowicz, S. Gray, C. Olsson, J. Pachocki, M. Petrov, H. P. D. O. Pinto, J. Raiman, T. Salimans, J. Schlatter, J. Schneider, S. Sidor, I. Sutskever, J. Tang, F. Wolski, S. S. Zhang. Dota 2 with large scale deep reinforcement learning, [Online], Available: https://arxiv.org/abs/1912.06680, 2019.
O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Y. Wang, T. Pfaff, Y. H. Wu, R. Ring, D. Yogatama, D. Wünsch, K. Mckinney, O. Smith, T. Schaul, T. Lillicrap, K. Kavukcuoglu, D. Hassabis, C. Apps, D. Silver. Grandmaster level in StarCraft ii using multi-agent reinforcement learning. Nature, vol. 575, no. 7782, pp. 350–354, 2019. DOI: https://doi.org/10.1038/s41586-019-1724-z.
Article Google Scholar
A. Nair, P. Srinivasan, S. Blackwell, C. Alcicek, R. Fearon, A. De Maria, V. Panneershelvam, M. Suleyman, C. Beattie, S. Petersen, S. Legg, V. Mnih, K. Kavukcuoglu, D. Silver. Massively parallel methods for deep reinforcement learning, [Online], Available: https://arxiv.org/abs/1507.04296, 2015.
L. Espehott, R. Marinier, P. Stanczyk, K. Wang, M. Michalski. SEED RL: Scalable and efficient deep-RL with accelerated central inference. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.
L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V. Mnih, T. Ward, Y. Doron, V. Firoiu, T. Harley, I. Dunning, S. Legg, K. Kavukcuoglu. IMPALA: Scalable distributed deep-RL with importance weighted actor-learner architectures. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 1407–1416, 2018.
A. Sergeev, M. Del Balso. Horovod: Fast and easy distributed deep learning in TensorFlow, [Online], Available: https://arxiv.org/abs/1802.05799, 2018.
P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, M. Elibol, Z. H. Yang, W. Paul, M. I. Jordan, I. Stoica. Ray: A distributed framework for emerging AI applications. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, Carlsbad, USA, pp. 561–577, 2018.
E. Liang, R. Liaw, R. Nishihara, P. Moritz, R. Fox, K. Goldberg, J. Gonzalez, M. Jordan, I. Stoica. RLliB: Abstractions for distributed reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 3053–3062, 2018.
M. R. Samsami, H. Alimadad. Distributed deep reinforcement learning: An overview, [Online], Available: https://arxiv.org/abs/2011.11012, 2020.
J. Czech. Distributed methods for reinforcement learning survey. Reinforcement Learning Algorithms: Analysis and Applications, B. Belousov, H. Abdulsamad, P. Klink, S. Parisi, J. Peters, Eds., Cham, Switzerland: Springer, pp. 151–161, 2021. DOI: https://doi.org/10.1007/978-3-030-41188-6_13.
Google Scholar
K. Arulkumaran, M. P. Deisenroth, M. Brundage, A. A. Bharath. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, vol. 34, no. 6, pp. 26–38, 2017. DOI: https://doi.org/10.1109/MSP.2017.2743240.
Article Google Scholar
T. M. Moerland, J. Broekens, C. M. Jonker. Model-based reinforcement learning: A survey, [Online], Available: https://arxiv.org/abs/2006.16712, 2020.
S. Gronauer, K. Diepold. Multi-agent deep reinforcement learning: A survey. Artificial Intelligence Review, vol. 55, no. 2, pp. 895–943, 2022. DOI: https://doi.org/10.1007/s10462-021-09996-w.
Article Google Scholar
Y. D. Yang, J. Wang. An overview of multi-agent reinforcement learning from game theoretical perspective, [Online], Available: https://arxiv.org/abs/2011.00583, 2021.
T. Ben-Num, T. Hoefler. Demystifying parallel and distributed deep learning: An in-depth concurrency analysis. ACM Computing Surveys, vol. 52, no. 4, Article number 65, 2020. DOI: https://doi.org/10.1145/3320060.
W. Wen, C. Xu, F. Yan, C. P. Wu, Y. D. Wang, Y. R. Chen, H. Li. TernGrad: Ternary gradients to reduce communication in distributed deep learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, pp. 1508–1518, 2017.
J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, A. Y. Ng. Large scale distributed deep networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, USA, pp. 1223–1231, 2012.
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. F. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Q. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Q. Zheng. TensorFlow: Large-scale machine learning on heterogeneous distributed systems, [Online], Available: https://arxiv.org/abs/1603.04467, 2016.
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov. Proximal policy optimization algorithms, [Online], Available: https://arxiv.org/abs/1707.06347, 2022.
J. Park, S. Samarakoon, A. Elgabli, J. Kim, M. Bennis, S. L. Kim, M. Debbah. Communication-efficient and distributed learning over wireless networks: Principles and applications. In Proceedings of the IEEE, vol. 109, no. 5, pp. 796–819, 2021. DOI: https://doi.org/10.1109/JPROC.2021.3055679.
Article Google Scholar
T. C. Chiu, Y. Y. Shih, A. C. Pang, C. S. Wang, W. Weng, C. T. Chou. Semisupervised distributed learning with non-IID data for AIoT service platform. IEEE Internet of Things Journal, vol. 7, no. 10, pp. 9266–9277, 2020. DOI: https://doi.org/10.1109/JIOT.2020.2995162.
Article Google Scholar
Q. Y. Yin, J. Yang, K. Q. Huang, M. J. Zhao, W. C. Ni, B. Liang, Y. Huang, S. Wu, L. Wang. AI in human-computer gaming: Techniques, challenges and opportunities. Machine Intelligence Research, vol. 20, no. 3, pp. 299–317, 2023. DOI: https://doi.org/10.1007/s11633-022-1384-6.
Article Google Scholar
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis. Human-level control through deep reinforcement learning. Nature, vol. 518, no. 7540, pp. 529–533, 2015. DOI: https://doi.org/10.1038/nature14236.
Article Google Scholar
Y. Burda, H. Edwards, A. Storkey, O. Klimov. Exploration by random network distillation. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, USA, 2019.
M. Samvelyan, T. Rashid, C. S. de Witt, G. Farquhar, N. Nardelli, T. G. J. Rudner, C. M. Hung, P. H. S. Torr, J. N. Foerster, S. Whiteson. The starcraft multi-agent challenge. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, Montreal, Canada, pp. 2186–2188, 2019.
M. Lanctot, E. Lockhart, J. B. Lespiau, V. Zambaldi, S. Upadhyay, J. Pérolat, S. Srinivasan, F. Timbers, K. Tuyls, S. Omidshafiei, D. Hennes, D. Morrill, P. Muller, T. Ewalds, R. Faulkner, J. Kramár, B. De Vylder, B. Saeta, J. Bradbury, D. Ding, S. Borgeaud, M. Lai, J. Schrittwieser, T. Anthony, E. Hughes, I. Danihelka, J. Ryan-Davis. OpenSpiel: A framework for reinforcement learning in games, [Online], Available: https://arxiv.org/abs/1908.09453, 2020.
V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In Proceedings of the 33rd International Conference on Machine Learning, New York City, USA, pp. 1928–1937, 2016.
D. Horgan, J. Quan, D. Budden, G. Barth-Maron, M. Hessel, H. van Hasselt, D. Silver. Distributed prioritized experience replay. In Proceedings of the 6th International Conference on Learning Representations, Vancouver, Canada, 2018.
N. Heess, D. TB, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Z. Y. Wang, S. M. Ali Eslami, M. A. Riedmiller, D. Silver. Emergence of locomotion behaviours in rich environments, [Online], Available: https://arxiv.org/abs/1707.02286, 2017.
S. Kapturowski, G. Ostrovski, J. Quan, R. Munos, W. Dabney. Recurrent experience replay in distributed reinforcement learning. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, USA, 2019.
D. H. Ye, G. B. Chen, W. Zhang, S. Chen, B. Yuan, B. Liu, J. Chen, Z. Liu, F. H. Qiu, H. S. Yu, Y. Y. T. Yin, B. Shi, L. Wang, T. F. Shi, Q. Fu, W. Yang, L. X. Huang, W. Liu. Towards playing full moba games with deep reinforcement learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, pp. 621–632, 2020.
M. Babaeizadeh, I. Frosio, S. Tyree, J. Clemons, J. Kautz. Reinforcement learning through asynchronous advantage actor-critic on a GPU. In Proceedings of the 5th International Conference on Learning Representations, Toulon, France, 2017.
A. Stooke, P. Abbeel. Accelerated methods for deep reinforcement learning, [Online], Available: https://arxiv.org/abs/1803.02811, 2019.
A. V. Clemente, H. N. Castejón, A. Chandra. Efficient parallel methods for deep reinforcement learning, [Online], Available: https://arxiv.org/abs/1705.04862, 2017.
E. Wijmans, A. Kadian, A. Morcos, S. Lee, I. Essa, D. Parikh, M. Savva, D. Batra. DD-PPO: Learning near-perfect pointgoal navigators from 2.5 billion frames. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.
M. Jaderberg, W. M. Czarnecki, I. Dunning, L. Marris, G. Lever, A. G. Castañeda, C. Beattie, N. C. Rabinowitz, A. S. Morcos, A. Ruderman, N. Sonnerat, T. Green, L. Deason, J. Z. Leibo, D. Silver, D. Hassabis, K. Kavukcuoglu, T. Graepel. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, vol. 364, no. 6443, pp. 859–865, 2019. DOI: https://doi.org/10.1126/science.aau6249.
Article MathSciNet Google Scholar
D. C. Zha, J. R. **e, W. Y. Ma, S. Zhang, X. R. Lian, X. Hu, J. Liu. DouZero: Mastering DouDizhu with self-play deep reinforcement learning. In Proceedings of the 38th International Conference on Machine Learning, pp. 12333–12344, 2021.
B. Baker, I. Kanitscheider, T. Markov, Y. Wu, G. Powell, B. McGrew, I. Mordatch. Emergent tool use from multi-agent autocurricula. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.
T. Rashid, M. Samvelyan, C. S. de Witt, G. Farquhar, J. N. Foerster, S. Whiteson. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 4295–4304, 2018.
D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, D. Hassabis. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, vol. 362, no. 6419, pp. 1140–1144, 2018. DOI: https://doi.org/10.1126/science.aar6404.
Article MathSciNet Google Scholar
X. J. Wang, J. X. Song, P. H. Qi, P. Peng, Z. K. Tang, W. Zhang, W. M. Li, X. J. Pi, J. J. He, C. Gao, H. T. Long, Q. Yuan. SCC: An efficient deep reinforcement learning agent mastering the game of StarCraft II. In Proceedings of the 38th International Conference on Machine Learning, pp. 10905–10915, 2021.
J. Paredis. Coevolutionary computation. Artificial Life, vol. 2, no. 4, pp. 355–375, 1995. DOI: https://doi.org/10.1162/artl.1995.2.4.355.
Article Google Scholar
N. Brown, T. Sandholm. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, vol. 359, no. 6374, pp. 418–424, 2018. DOI: https://doi.org/10.1126/science.aao1733.
Article MathSciNet Google Scholar
M. Moravčík, M. Schmid, N. Burch, V. Lisý, D. Morrill, N. Bard, T. Davis, K. Waugh, M. Johanson, M. Bowling. Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science, vol. 356, no. 6337, pp. 508–513, 2017. DOI: https://doi.org/10.1126/science.aam6960.
Article MathSciNet Google Scholar
A. DiGiovanni, E. C. Zell. Survey of self-play in reinforcement learning, [Online], Available: https://arxiv.org/abs/2107.02850, 2021.
Q. Y. Yin, M. J. Zhao, W. C. Ni, J. G. Zhang, K. Q. Huang. Intelligent decision making technology and challenge of wargame. Acta Automatica Sinica, vol. 49, no. 5, pp. 9132–928, 2023. DOI: https://doi.org/10.16383/j.aas.c210547. (in Chinese)
Google Scholar
R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, I. Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, pp. 6382–6393, 2017.
M. W. Hoffman, B. Shahriari, J. Aslanides, G. Barth-Maron, N. Momchev, D. Sinopalnikov, P. Stańczyk, S. Ramos, A. Raichuk, D. Vincent, L. Hussenot, R. Dadashi, G. Dulac-Arnold, M. Orsini, A. Jacq, J. Ferret, N. Vieillard, S. K. S. Ghasemipour, S. Girgin, O. Pietquin, F. Behbahani, T. Norman, A. Abdolmaleki, A. Cassirer, F. Yang, K. Baumli, S. Henderson, A. Friesen, R. Haroun, A. Novikov, S. G. Colmenarejo, S. Cabi, C. Gulcehre, T. Le Paine, S. Srinivasan, A. Cowie, Z. Y. Wang, B. Piot, N. de Freitas. Acme: A research framework for distributed reinforcement learning, [Online], Available: https://arxiv.org/abs/2006.00979, 2020.
S. Fujimoto, H. Hoof, D. Meger. Addressing function approximation error in actor-critic methods. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 1587–1596, 2018.
J. Ho, S. Ermon. Generative adversarial imitation learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, pp. 4572–4580, 2016.
S. Reddy, A. D. Dragan, S. Levine. SQIL: Imitation learning via reinforcement learning with sparse rewards. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2019.
J. Y. Weng, H. Y. Chen, D. Yan, K. C. You, A. Duburcq, M. H. Zhang, Y. Su, H. Su, J. Zhu. Tianshou: A highly modularized deep reinforcement learning library. Journal of Machine Learning Research, vol. 23, no. 267, pp. 1–6, 2022.
MathSciNet Google Scholar
H. Küttler, N. Nardelli, T. Lavril, M. Selvatici, V. Sivakumar, T. Rocktäschel, E. Grefenstette. Torchbeast: A pytorch platform for distributed RL, [Online], Available: https://arxiv.org/abs/1910.03552, 2019.
M. Zhou, Z. Y. Wan, H. J. Wang, M. N. Wen, R. Z. Wu, Y. Wen, Y. D. Yang, W. N. Zhang, J. Wan. MALiB: A parallel framework for population-based multi-agent reinforcement learning, [Online], Available: https://arxiv.org/abs/2106.07551, 2021.
P. Muller, S. Omidshafiei, M. Rowland, K. Tuyls, J. Pérolat, S. Q. Liu, D. Hennes, L. Marris, M. Lanctot, E. Hughes, Z. Wang, G. Lever, N. Heess, T. Graepel, R. Munos. A generalized training approach for multiagent learning. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.
S. McAleer, J. Lanier, R. Fox, P. Baldi. Pipeline psro: A scalable approach for finding approximate nash equilibria in large games. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, pp. 20238–20248, 2020.
J. Heinrich, M. Lanctot, D. Silver. Fictitious self-play in extensive-form games. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, Lille, France, pp. 805–813, 2015.
H. T. Jia, Y. J. Hu, Y. F. Chen, C. X. Ren, T. J. Lv, C. J. Fan, C. J. Zhang. Fever basketball: A complex, flexible, and asynchronized sports game environment for multi-agent reinforcement learning, [Online], Available: https://arxiv.org/abs/2012.03204, 2020.
E. Accinelli, E. J. S. Carrera. Evolutionarily stable strategies and replicator dynamics in asymmetric two-population games. Dynamics, Games and Science I, M. M. Peixoto, A. A. Pinto, D. A. Rand, Eds., Berlin, Germany: Springer, pp. 25–35, 2011. DOI: https://doi.org/10.1007/978-3-642-11456-4_3.
Chapter Google Scholar

Download references

Acknowledgements

This work was supported by Open Fund/Postdoctoral Fund of the Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, China (No. CASIA-KFKT-XDA27040809).

Author information

Authors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Bei**g, 100190, China
Qiyue Yin, Tongtong Yu, Shengqi Shen, Mei**g Zhao, Wancheng Ni, Kaiqi Huang & Liang Wang
School of Artificial Intelligence, University of Chinese Academy of Sciences, Bei**g, 100049, China
Qiyue Yin, Wancheng Ni, Kaiqi Huang & Liang Wang
Department of Automation, Tsinghua University, Bei**g, 100084, China
Jun Yang & Bin Liang
Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Bei**g, 100190, China
Kaiqi Huang & Liang Wang

Authors

Qiyue Yin
View author publications
You can also search for this author in PubMed Google Scholar
Tongtong Yu
View author publications
You can also search for this author in PubMed Google Scholar
Shengqi Shen
View author publications
You can also search for this author in PubMed Google Scholar
Jun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Mei**g Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Wancheng Ni
View author publications
You can also search for this author in PubMed Google Scholar
Kaiqi Huang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Liang
View author publications
You can also search for this author in PubMed Google Scholar
Liang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiyue Yin.

Ethics declarations

The authors declared that they have no conflicts of interest to this work.

Additional information

Colored figures are available in the online version at https://springer.longhoe.net/journal/11633

Qiyue Yin received the Ph.D. degree in pattern recognition and intelligence systems from the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA), China in 2017. He is currently an associate professor at CASIA, China.

His research interests include machine learning, pattern recognition and artificial intelligence on games.

Tongtong Yu received the master’s degree in computer science and technology from Bei**g University of Technology, China in 2020. She is currently an engineer at Institute of Automation, Chinese Academy of Sciences (CASIA), China.

Her research interests include machine learning and artificial intelligence on games.

Shengqi Shen received the master’s degree in control science and engineering from Bei**g University of Chemical Technology, China in 2018. He is currently an engineer at Institute of Automation, Chinese Academy of Sciences (CASIA), China.

His research interests include machine learning, decision making in games.

Jun Yang received the Ph.D. degree in control science and engineering from Tsinghua University, China in 2011. He is currently an associate professor with the Department of Automation, Tsinghua University, China.

His research interests include multiagent reinforcement learning and game theory.

Mei**g Zhao received the Ph.D. degree in pattern recognition and intelligence systems from Integrated Information System Research Center, Institute of Automation, Chinese Academy of Sciences (CASIA), China in 2016. She is currently an associate professor at CASIA, China.

Her research interests include semantic information processing, knowledge representation and reasoning.

Wancheng Ni received the Ph.D. degree in contemporary integrated manufacturing systems from Department of Automation, Tsinghua University, China in 2007. She is currently a professor at Institute of Automation, Chinese Academy of Sciences (CASIA), China.

@Her research interests include information processing and knowledge discovery, group intelligent decision-making platform and evaluation.

Kaiqi Huang received the Ph.D. degree in communication and information processing from Southeast University, China in 2004. He is currently a professor at Institute of Automation, Chinese Academy of Sciences (CASIA), China.

His research interests include visual surveillance, image understanding, pattern recognition, human-computer gaming and biological based vision.

Bin Liang received the Ph.D. degree in precision instruments and mechanology from Tsinghua University, China in 1994. He is currently a professor with the Department of Automation, Tsinghua University, China.

His research interests include artificial intelligence, anomaly detection, space robotics, and fault-tolerant control.

Liang Wang received the Ph.D. degree in pattern recognition and intelligence systems from the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA), China in 2004. He is currently a professor at CASIA, China.

His research interests include computer vision, pattern recognition, machine learning, and data mining.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yin, Q., Yu, T., Shen, S. et al. Distributed Deep Reinforcement Learning: A Survey and a Multi-player Multi-agent Learning Toolbox. Mach. Intell. Res. 21, 411–430 (2024). https://doi.org/10.1007/s11633-023-1454-4

Download citation

Received: 30 December 2022
Accepted: 06 May 2023
Published: 11 January 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s11633-023-1454-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Distributed Deep Reinforcement Learning: A Survey and a Multi-player Multi-agent Learning Toolbox

Abstract

Article PDF

Similar content being viewed by others

A survey on multi-agent deep reinforcement learning: from the perspective of challenges and applications

Deep Reinforcement Learning Techniques in Diversified Domains: A Survey

Challenges of Reinforcement Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Distributed Deep Reinforcement Learning: A Survey and a Multi-player Multi-agent Learning Toolbox

Abstract

Article PDF

Similar content being viewed by others

A survey on multi-agent deep reinforcement learning: from the perspective of challenges and applications

Deep Reinforcement Learning Techniques in Diversified Domains: A Survey

Challenges of Reinforcement Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation