Abstract
Autonomous systems embedded in our physical world need real-world interaction in order to function, but they also depend on it as a means to learn. This is the essence of artificial Embodied Cognition, in which machine intelligence is tightly coupled to sensors and effectors and where learning happens from continually experiencing the dynamic world as time-series data, received and processed from a situated and contextually-relative perspective. From this stream, our engineered agents must perceptually discriminate, deal with noise and uncertainty, recognize the causal influence of their actions (sometimes with significant and variable temporal lag), pursue multiple and changing goals that are often incompatible with each other, and make decisions under time pressure. To further complicate matters, unpredictability caused by the actions of other adaptive agents makes this experiential data stochastic and statistically non-stationary. Reinforcement Learning approaches to these problems often oversimplify many of these aspects, e.g., by assuming stationarity, collapsing multiple goals into a single reward signal, using repetitive discrete training episodes, or removing real-time requirements. Because we are interested in develo** dependable and trustworthy autonomy, we have been studying these problems by retaining all these inherent complexities and only simplifying the agent’s environmental bandwidth requirements. The Multi-Agent Research Basic Learning Environment (MARBLE) is a computational framework for studying the nuances of cooperative, competitive, and adversarial learning, where emergent behaviors can be better understood through carefully controlled experiments. In particular, we are using it to evaluate a novel reinforcement learning long-term memory data structure based on probabilistic suffix trees. Here, we describe this research methodology, and report on the results of some early experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Posterior probabilities are computed from maximum entropy priors initialized by setting the alpha parameter in a multi-modal Dirichlet distribution.
References
Unity 3d game engine. https://unity3d.com/public-relations
Anderson, M.L.: Embodied cognition: a field guide. Artif. Intell. 149(1), 91–130 (2003)
Axelrod, R., Hamilton, W.D.: The evolution of cooperation. Science 211(4489), 1390–1396 (1981)
Bach, J.: Principles of Synthetic Intelligence PSI: An Architecture of Motivated Cognition, vol. 4. Oxford University Press, Oxford (2009)
Begleiter, R., El-Yaniv, R., Yona, G.: On prediction using variable order Markov models. J. Artif. Intell. Res. 22, 385–421 (2004)
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: OpenAI Gym (2016). http://arxiv.org/abs/1606.01540v1
Brooks, R.: A robust layered control system for a mobile robot. IEEE J. Robot. Autom. 2(1), 14–23 (1986)
Chung, M., Buro, M., Schaeffer, J.: Monte Carlo planning in RTS games. In: Proceedings of IEEE 2005 Symposium on Computational Intelligence and Games, pp. 117–125 (2005)
Coad, P.: Object-oriented patterns. Commun. ACM 35(9), 152–159 (1992)
Dean, T.L., Boddy, M.S.: An analysis of time-dependent planning. In: Proceedings of the Seventh AAAI National Conference on Artificial Intelligence, vol. 88, pp. 49–54. AAAI Press, Saint Paul (1988)
Domingos, P.: The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World. Basic Books, New York (2015)
Hawkins, J., Blakeslee, S.: On Intelligence: How a New Understanding of the Brain Will Lead to the Creation of Truly Intelligent Machines. Macmillan, London (2007)
Jennings, N.R., Sycara, K., Wooldridge, M.: A roadmap of agent research and development. Auton. Agent. Multi Agent Syst. 1(1), 7–38 (1998)
Laird, J.E., Newell, A., Rosenbloom, P.S.: Soar: an architecture for general intelligence. Artif. Intell. 33(1), 1–64 (1987)
Machado, M.C., Bellemare, M.G., Talvitie, E., Veness, J., Hausknecht, M., Bowling, M.: Revisiting the arcade learning environment: evaluation protocols and open problems for general agents. J. Artif. Intell. Res. 61, 523–562 (2018)
Mitchell, M.: Complexity: A Guided Tour. Oxford University Press, Oxford (2009)
Mukherjee, S.: The Gene: An Intimate History. Simon and Schuster, New York (2017)
Nguyen, P., Sunehag, P., Hutter, M.: Context tree maximizing reinforcement learning. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence. Association for the Advancement of Artificial Intelligence (2012)
Norman, M.D., Koehler, M.T., Pitsko, R.: Applied complexity science: enabling emergence through heuristics and simulations. In: Mittal, S., Diallo, S., Tolk, A. (eds.) Emergent Behavior in Complex Systems Engineering: A Modeling and Simulation Approach, pp. 201–226. Wiley, Hoboken (2018)
Ontañón, S., Barriga, N.A., Silva, C.R., Moraes, R.O., Lelis, L.H.: The first microRTS artificial intelligence competition. AI Mag. 39(1), 75–83 (2018)
Patel, A.: Red blob games, hexagonal grid reference. https://www.redblobgames.com/grids/hexagons/
Schank, R.C.: Dynamic Memory Revisited. Cambridge University Press, New York (1999)
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., Hassabis, D.: Mastering the game of Go without human knowledge. Nature 550(7676), 354–371 (2017)
Silvey, P.E.: Leveling up: strategies to achieve integrated cognitive architectures. In: Fall Symposium Series - A Standard Model of Mind: AAAI Technical Report FS-17-05, AAAI 2017, pp. 460–465 (2017)
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)
Volf, P.A., Willems, F.M.: A study of the context tree maximizing method. In: Proceedings of 16th Benelux Symposium on Information Theory, Nieuwerkerk Ijsel, Netherlands, pp. 3–9 (1995)
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge (1989)
Wilson, M.: Six views of embodied cognition. Psychon. Bull. Rev. 9(4), 625–636 (2002)
Acknowledgements and Disclaimer
The authors wish to thank Jason F. Kutarnia and Brittany A. Tracy for their assistance with this research. Approved for Public Release; Distribution Unlimited. Case Number 18-1473.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Silvey, P.E., Norman, M.D. (2018). Embodied Cognition and Multi-Agent Behavioral Emergence. In: Morales, A., Gershenson, C., Braha, D., Minai, A., Bar-Yam, Y. (eds) Unifying Themes in Complex Systems IX. ICCS 2018. Springer Proceedings in Complexity. Springer, Cham. https://doi.org/10.1007/978-3-319-96661-8_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-96661-8_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96660-1
Online ISBN: 978-3-319-96661-8
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)