Abstract
With the breakthrough of AlphaGo, deep reinforcement learning has become a recognized technique for solving sequential decision-making problems. Despite its reputation, data inefficiency caused by its trial and error learning mechanism makes deep reinforcement learning difficult to apply in a wide range of areas. Many methods have been developed for sample efficient deep reinforcement learning, such as environment modelling, experience transfer, and distributed modifications, among which distributed deep reinforcement learning has shown its potential in various applications, such as human-computer gaming and intelligent transportation. In this paper, we conclude the state of this exciting field, by comparing the classical distributed deep reinforcement learning methods and studying important components to achieve efficient distributed learning, covering single player single agent distributed deep reinforcement learning to the most complex multiple players multiple agents distributed deep reinforcement learning. Furthermore, we review recently released toolboxes that help to realize distributed deep reinforcement learning without many modifications of their non-distributed versions. By analysing their strengths and weaknesses, a multi-player multi-agent distributed deep reinforcement learning toolbox is developed and released, which is further validated on Wargame, a complex environment, showing the usability of the proposed toolbox for multiple players and multiple agents distributed deep reinforcement learning under complex games. Finally, we try to point out challenges and future trends, ho** that this brief review can provide a guide or a spark for researchers who are interested in distributed deep reinforcement learning.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis. Mastering the game of go with deep neural networks and tree search. Nature, vol. 529, no. 7587, pp. 484–489, 2016. DOI:https://doi.org/10.1038/nature16961.
D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. T. Chen, T. Lillicrap, F. Hui, L. Sifre, G. Van Den driessche, T. Graepel, D. Hassabis. Mastering the game of go without human knowledge. Nature, vol. 550, no. 7676, pp. 354–359, 2017. DOI: https://doi.org/10.1038/nature24270.
Y. Yu. Towards sample efficient reinforcement learning. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, pp. 5739–5743, 2018. DOI: https://doi.org/10.24963/ijcai.2018/820.
X. P. Qiu, T. X. Sun, Y. G. Xu, Y. F. Shao, N. Dai, X. J. Huang. Pre-trained models for natural language processing: A survey. Science China Technological Sciences, vol. 63, no. 10, pp. 1872–1897, 2020. DOI: https://doi.org/10.1007/s11431-020-1647-3.
J. J. Li, S. Koyamada, Q. W. Ye, G. Q. Liu, C. Wang, R. H. Yang, L. Zhao, T. Qin, T. Y. Liu, H. W. Hon. Suphx: Mastering mahjong with deep reinforcement learning, [Online], Available: https://arxiv.org/abs/2003.13590, 2020.
C. Berner, G. Brockman, B. Chan, V. Cheung, P. Dębiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse, R. Józefowicz, S. Gray, C. Olsson, J. Pachocki, M. Petrov, H. P. D. O. Pinto, J. Raiman, T. Salimans, J. Schlatter, J. Schneider, S. Sidor, I. Sutskever, J. Tang, F. Wolski, S. S. Zhang. Dota 2 with large scale deep reinforcement learning, [Online], Available: https://arxiv.org/abs/1912.06680, 2019.
O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Y. Wang, T. Pfaff, Y. H. Wu, R. Ring, D. Yogatama, D. Wünsch, K. Mckinney, O. Smith, T. Schaul, T. Lillicrap, K. Kavukcuoglu, D. Hassabis, C. Apps, D. Silver. Grandmaster level in StarCraft ii using multi-agent reinforcement learning. Nature, vol. 575, no. 7782, pp. 350–354, 2019. DOI: https://doi.org/10.1038/s41586-019-1724-z.
A. Nair, P. Srinivasan, S. Blackwell, C. Alcicek, R. Fearon, A. De Maria, V. Panneershelvam, M. Suleyman, C. Beattie, S. Petersen, S. Legg, V. Mnih, K. Kavukcuoglu, D. Silver. Massively parallel methods for deep reinforcement learning, [Online], Available: https://arxiv.org/abs/1507.04296, 2015.
L. Espehott, R. Marinier, P. Stanczyk, K. Wang, M. Michalski. SEED RL: Scalable and efficient deep-RL with accelerated central inference. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.
L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V. Mnih, T. Ward, Y. Doron, V. Firoiu, T. Harley, I. Dunning, S. Legg, K. Kavukcuoglu. IMPALA: Scalable distributed deep-RL with importance weighted actor-learner architectures. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 1407–1416, 2018.
A. Sergeev, M. Del Balso. Horovod: Fast and easy distributed deep learning in TensorFlow, [Online], Available: https://arxiv.org/abs/1802.05799, 2018.
P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, M. Elibol, Z. H. Yang, W. Paul, M. I. Jordan, I. Stoica. Ray: A distributed framework for emerging AI applications. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, Carlsbad, USA, pp. 561–577, 2018.
E. Liang, R. Liaw, R. Nishihara, P. Moritz, R. Fox, K. Goldberg, J. Gonzalez, M. Jordan, I. Stoica. RLliB: Abstractions for distributed reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 3053–3062, 2018.
M. R. Samsami, H. Alimadad. Distributed deep reinforcement learning: An overview, [Online], Available: https://arxiv.org/abs/2011.11012, 2020.
J. Czech. Distributed methods for reinforcement learning survey. Reinforcement Learning Algorithms: Analysis and Applications, B. Belousov, H. Abdulsamad, P. Klink, S. Parisi, J. Peters, Eds., Cham, Switzerland: Springer, pp. 151–161, 2021. DOI: https://doi.org/10.1007/978-3-030-41188-6_13.
K. Arulkumaran, M. P. Deisenroth, M. Brundage, A. A. Bharath. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, vol. 34, no. 6, pp. 26–38, 2017. DOI: https://doi.org/10.1109/MSP.2017.2743240.
T. M. Moerland, J. Broekens, C. M. Jonker. Model-based reinforcement learning: A survey, [Online], Available: https://arxiv.org/abs/2006.16712, 2020.
S. Gronauer, K. Diepold. Multi-agent deep reinforcement learning: A survey. Artificial Intelligence Review, vol. 55, no. 2, pp. 895–943, 2022. DOI: https://doi.org/10.1007/s10462-021-09996-w.
Y. D. Yang, J. Wang. An overview of multi-agent reinforcement learning from game theoretical perspective, [Online], Available: https://arxiv.org/abs/2011.00583, 2021.
T. Ben-Num, T. Hoefler. Demystifying parallel and distributed deep learning: An in-depth concurrency analysis. ACM Computing Surveys, vol. 52, no. 4, Article number 65, 2020. DOI: https://doi.org/10.1145/3320060.
W. Wen, C. Xu, F. Yan, C. P. Wu, Y. D. Wang, Y. R. Chen, H. Li. TernGrad: Ternary gradients to reduce communication in distributed deep learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, pp. 1508–1518, 2017.
J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, A. Y. Ng. Large scale distributed deep networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, USA, pp. 1223–1231, 2012.
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. F. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Q. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Q. Zheng. TensorFlow: Large-scale machine learning on heterogeneous distributed systems, [Online], Available: https://arxiv.org/abs/1603.04467, 2016.
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov. Proximal policy optimization algorithms, [Online], Available: https://arxiv.org/abs/1707.06347, 2022.
J. Park, S. Samarakoon, A. Elgabli, J. Kim, M. Bennis, S. L. Kim, M. Debbah. Communication-efficient and distributed learning over wireless networks: Principles and applications. In Proceedings of the IEEE, vol. 109, no. 5, pp. 796–819, 2021. DOI: https://doi.org/10.1109/JPROC.2021.3055679.
T. C. Chiu, Y. Y. Shih, A. C. Pang, C. S. Wang, W. Weng, C. T. Chou. Semisupervised distributed learning with non-IID data for AIoT service platform. IEEE Internet of Things Journal, vol. 7, no. 10, pp. 9266–9277, 2020. DOI: https://doi.org/10.1109/JIOT.2020.2995162.
Q. Y. Yin, J. Yang, K. Q. Huang, M. J. Zhao, W. C. Ni, B. Liang, Y. Huang, S. Wu, L. Wang. AI in human-computer gaming: Techniques, challenges and opportunities. Machine Intelligence Research, vol. 20, no. 3, pp. 299–317, 2023. DOI: https://doi.org/10.1007/s11633-022-1384-6.
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis. Human-level control through deep reinforcement learning. Nature, vol. 518, no. 7540, pp. 529–533, 2015. DOI: https://doi.org/10.1038/nature14236.
Y. Burda, H. Edwards, A. Storkey, O. Klimov. Exploration by random network distillation. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, USA, 2019.
M. Samvelyan, T. Rashid, C. S. de Witt, G. Farquhar, N. Nardelli, T. G. J. Rudner, C. M. Hung, P. H. S. Torr, J. N. Foerster, S. Whiteson. The starcraft multi-agent challenge. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, Montreal, Canada, pp. 2186–2188, 2019.
M. Lanctot, E. Lockhart, J. B. Lespiau, V. Zambaldi, S. Upadhyay, J. Pérolat, S. Srinivasan, F. Timbers, K. Tuyls, S. Omidshafiei, D. Hennes, D. Morrill, P. Muller, T. Ewalds, R. Faulkner, J. Kramár, B. De Vylder, B. Saeta, J. Bradbury, D. Ding, S. Borgeaud, M. Lai, J. Schrittwieser, T. Anthony, E. Hughes, I. Danihelka, J. Ryan-Davis. OpenSpiel: A framework for reinforcement learning in games, [Online], Available: https://arxiv.org/abs/1908.09453, 2020.
V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In Proceedings of the 33rd International Conference on Machine Learning, New York City, USA, pp. 1928–1937, 2016.
D. Horgan, J. Quan, D. Budden, G. Barth-Maron, M. Hessel, H. van Hasselt, D. Silver. Distributed prioritized experience replay. In Proceedings of the 6th International Conference on Learning Representations, Vancouver, Canada, 2018.
N. Heess, D. TB, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Z. Y. Wang, S. M. Ali Eslami, M. A. Riedmiller, D. Silver. Emergence of locomotion behaviours in rich environments, [Online], Available: https://arxiv.org/abs/1707.02286, 2017.
S. Kapturowski, G. Ostrovski, J. Quan, R. Munos, W. Dabney. Recurrent experience replay in distributed reinforcement learning. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, USA, 2019.
D. H. Ye, G. B. Chen, W. Zhang, S. Chen, B. Yuan, B. Liu, J. Chen, Z. Liu, F. H. Qiu, H. S. Yu, Y. Y. T. Yin, B. Shi, L. Wang, T. F. Shi, Q. Fu, W. Yang, L. X. Huang, W. Liu. Towards playing full moba games with deep reinforcement learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, pp. 621–632, 2020.
M. Babaeizadeh, I. Frosio, S. Tyree, J. Clemons, J. Kautz. Reinforcement learning through asynchronous advantage actor-critic on a GPU. In Proceedings of the 5th International Conference on Learning Representations, Toulon, France, 2017.
A. Stooke, P. Abbeel. Accelerated methods for deep reinforcement learning, [Online], Available: https://arxiv.org/abs/1803.02811, 2019.
A. V. Clemente, H. N. Castejón, A. Chandra. Efficient parallel methods for deep reinforcement learning, [Online], Available: https://arxiv.org/abs/1705.04862, 2017.
E. Wijmans, A. Kadian, A. Morcos, S. Lee, I. Essa, D. Parikh, M. Savva, D. Batra. DD-PPO: Learning near-perfect pointgoal navigators from 2.5 billion frames. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.
M. Jaderberg, W. M. Czarnecki, I. Dunning, L. Marris, G. Lever, A. G. Castañeda, C. Beattie, N. C. Rabinowitz, A. S. Morcos, A. Ruderman, N. Sonnerat, T. Green, L. Deason, J. Z. Leibo, D. Silver, D. Hassabis, K. Kavukcuoglu, T. Graepel. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, vol. 364, no. 6443, pp. 859–865, 2019. DOI: https://doi.org/10.1126/science.aau6249.
D. C. Zha, J. R. **e, W. Y. Ma, S. Zhang, X. R. Lian, X. Hu, J. Liu. DouZero: Mastering DouDizhu with self-play deep reinforcement learning. In Proceedings of the 38th International Conference on Machine Learning, pp. 12333–12344, 2021.
B. Baker, I. Kanitscheider, T. Markov, Y. Wu, G. Powell, B. McGrew, I. Mordatch. Emergent tool use from multi-agent autocurricula. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.
T. Rashid, M. Samvelyan, C. S. de Witt, G. Farquhar, J. N. Foerster, S. Whiteson. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 4295–4304, 2018.
D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, D. Hassabis. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, vol. 362, no. 6419, pp. 1140–1144, 2018. DOI: https://doi.org/10.1126/science.aar6404.
X. J. Wang, J. X. Song, P. H. Qi, P. Peng, Z. K. Tang, W. Zhang, W. M. Li, X. J. Pi, J. J. He, C. Gao, H. T. Long, Q. Yuan. SCC: An efficient deep reinforcement learning agent mastering the game of StarCraft II. In Proceedings of the 38th International Conference on Machine Learning, pp. 10905–10915, 2021.
J. Paredis. Coevolutionary computation. Artificial Life, vol. 2, no. 4, pp. 355–375, 1995. DOI: https://doi.org/10.1162/artl.1995.2.4.355.
N. Brown, T. Sandholm. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, vol. 359, no. 6374, pp. 418–424, 2018. DOI: https://doi.org/10.1126/science.aao1733.
M. Moravčík, M. Schmid, N. Burch, V. Lisý, D. Morrill, N. Bard, T. Davis, K. Waugh, M. Johanson, M. Bowling. Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science, vol. 356, no. 6337, pp. 508–513, 2017. DOI: https://doi.org/10.1126/science.aam6960.
A. DiGiovanni, E. C. Zell. Survey of self-play in reinforcement learning, [Online], Available: https://arxiv.org/abs/2107.02850, 2021.
Q. Y. Yin, M. J. Zhao, W. C. Ni, J. G. Zhang, K. Q. Huang. Intelligent decision making technology and challenge of wargame. Acta Automatica Sinica, vol. 49, no. 5, pp. 9132–928, 2023. DOI: https://doi.org/10.16383/j.aas.c210547. (in Chinese)
R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, I. Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, pp. 6382–6393, 2017.
M. W. Hoffman, B. Shahriari, J. Aslanides, G. Barth-Maron, N. Momchev, D. Sinopalnikov, P. Stańczyk, S. Ramos, A. Raichuk, D. Vincent, L. Hussenot, R. Dadashi, G. Dulac-Arnold, M. Orsini, A. Jacq, J. Ferret, N. Vieillard, S. K. S. Ghasemipour, S. Girgin, O. Pietquin, F. Behbahani, T. Norman, A. Abdolmaleki, A. Cassirer, F. Yang, K. Baumli, S. Henderson, A. Friesen, R. Haroun, A. Novikov, S. G. Colmenarejo, S. Cabi, C. Gulcehre, T. Le Paine, S. Srinivasan, A. Cowie, Z. Y. Wang, B. Piot, N. de Freitas. Acme: A research framework for distributed reinforcement learning, [Online], Available: https://arxiv.org/abs/2006.00979, 2020.
S. Fujimoto, H. Hoof, D. Meger. Addressing function approximation error in actor-critic methods. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 1587–1596, 2018.
J. Ho, S. Ermon. Generative adversarial imitation learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, pp. 4572–4580, 2016.
S. Reddy, A. D. Dragan, S. Levine. SQIL: Imitation learning via reinforcement learning with sparse rewards. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2019.
J. Y. Weng, H. Y. Chen, D. Yan, K. C. You, A. Duburcq, M. H. Zhang, Y. Su, H. Su, J. Zhu. Tianshou: A highly modularized deep reinforcement learning library. Journal of Machine Learning Research, vol. 23, no. 267, pp. 1–6, 2022.
H. Küttler, N. Nardelli, T. Lavril, M. Selvatici, V. Sivakumar, T. Rocktäschel, E. Grefenstette. Torchbeast: A pytorch platform for distributed RL, [Online], Available: https://arxiv.org/abs/1910.03552, 2019.
M. Zhou, Z. Y. Wan, H. J. Wang, M. N. Wen, R. Z. Wu, Y. Wen, Y. D. Yang, W. N. Zhang, J. Wan. MALiB: A parallel framework for population-based multi-agent reinforcement learning, [Online], Available: https://arxiv.org/abs/2106.07551, 2021.
P. Muller, S. Omidshafiei, M. Rowland, K. Tuyls, J. Pérolat, S. Q. Liu, D. Hennes, L. Marris, M. Lanctot, E. Hughes, Z. Wang, G. Lever, N. Heess, T. Graepel, R. Munos. A generalized training approach for multiagent learning. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.
S. McAleer, J. Lanier, R. Fox, P. Baldi. Pipeline psro: A scalable approach for finding approximate nash equilibria in large games. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, pp. 20238–20248, 2020.
J. Heinrich, M. Lanctot, D. Silver. Fictitious self-play in extensive-form games. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, Lille, France, pp. 805–813, 2015.
H. T. Jia, Y. J. Hu, Y. F. Chen, C. X. Ren, T. J. Lv, C. J. Fan, C. J. Zhang. Fever basketball: A complex, flexible, and asynchronized sports game environment for multi-agent reinforcement learning, [Online], Available: https://arxiv.org/abs/2012.03204, 2020.
E. Accinelli, E. J. S. Carrera. Evolutionarily stable strategies and replicator dynamics in asymmetric two-population games. Dynamics, Games and Science I, M. M. Peixoto, A. A. Pinto, D. A. Rand, Eds., Berlin, Germany: Springer, pp. 25–35, 2011. DOI: https://doi.org/10.1007/978-3-642-11456-4_3.
Acknowledgements
This work was supported by Open Fund/Postdoctoral Fund of the Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, China (No. CASIA-KFKT-XDA27040809).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors declared that they have no conflicts of interest to this work.
Additional information
Colored figures are available in the online version at https://springer.longhoe.net/journal/11633
Qiyue Yin received the Ph.D. degree in pattern recognition and intelligence systems from the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA), China in 2017. He is currently an associate professor at CASIA, China.
His research interests include machine learning, pattern recognition and artificial intelligence on games.
Tongtong Yu received the master’s degree in computer science and technology from Bei**g University of Technology, China in 2020. She is currently an engineer at Institute of Automation, Chinese Academy of Sciences (CASIA), China.
Her research interests include machine learning and artificial intelligence on games.
Shengqi Shen received the master’s degree in control science and engineering from Bei**g University of Chemical Technology, China in 2018. He is currently an engineer at Institute of Automation, Chinese Academy of Sciences (CASIA), China.
His research interests include machine learning, decision making in games.
Jun Yang received the Ph.D. degree in control science and engineering from Tsinghua University, China in 2011. He is currently an associate professor with the Department of Automation, Tsinghua University, China.
His research interests include multiagent reinforcement learning and game theory.
Mei**g Zhao received the Ph.D. degree in pattern recognition and intelligence systems from Integrated Information System Research Center, Institute of Automation, Chinese Academy of Sciences (CASIA), China in 2016. She is currently an associate professor at CASIA, China.
Her research interests include semantic information processing, knowledge representation and reasoning.
Wancheng Ni received the Ph.D. degree in contemporary integrated manufacturing systems from Department of Automation, Tsinghua University, China in 2007. She is currently a professor at Institute of Automation, Chinese Academy of Sciences (CASIA), China.
@Her research interests include information processing and knowledge discovery, group intelligent decision-making platform and evaluation.
Kaiqi Huang received the Ph.D. degree in communication and information processing from Southeast University, China in 2004. He is currently a professor at Institute of Automation, Chinese Academy of Sciences (CASIA), China.
His research interests include visual surveillance, image understanding, pattern recognition, human-computer gaming and biological based vision.
Bin Liang received the Ph.D. degree in precision instruments and mechanology from Tsinghua University, China in 1994. He is currently a professor with the Department of Automation, Tsinghua University, China.
His research interests include artificial intelligence, anomaly detection, space robotics, and fault-tolerant control.
Liang Wang received the Ph.D. degree in pattern recognition and intelligence systems from the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA), China in 2004. He is currently a professor at CASIA, China.
His research interests include computer vision, pattern recognition, machine learning, and data mining.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yin, Q., Yu, T., Shen, S. et al. Distributed Deep Reinforcement Learning: A Survey and a Multi-player Multi-agent Learning Toolbox. Mach. Intell. Res. 21, 411–430 (2024). https://doi.org/10.1007/s11633-023-1454-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11633-023-1454-4