Abstract
Developments in reinforcement learning (RL) have allowed algorithms to achieve impressive performance in complex, but largely static problems. In contrast, biological learning seems to value efficient adaptation to a constantly changing world. Here we build on a recently proposed model of neuronal learning that suggests neurons predict their own future activity to optimize their energy balance. That work proposed a neuronal learning rule that uses presynaptic input to modulate prediction error. Here we argue that an analogous RL rule would use action probability to modulate reward prediction error. We show that this modulation makes the agent more sensitive to negative experiences, and more careful in forming preferences: features that facilitate adaptation to change. We embed the proposed rule in both tabular and deep-Q-network RL algorithms, and find that it outperforms conventional algorithms in simple but highly-dynamic tasks. It also exhibits a “paradox of choice” effect that has been observed in humans. The new rule may encapsulate a core principle of biological intelligence; an important component of human-like learning and adaptation - with both its benefits and trade-offs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Almeida, L.B.: A learning rule for asynchronous perceptrons with feedback in a combinatorial environment. In: Artificial Neural Networks: Concept Learning, pp. 102–111. IEEE Press, January 1990
Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3(Nov), 397–422 (2002)
Baldi, P., Pineda, F.: Contrastive learning and neural oscillations. Neural Comput. 3(4), 526–545 (1991). https://doi.org/10.1162/neco.1991.3.4.526
Berg, E.A.: A simple objective technique for measuring flexibility in thinking. J. Gen. Psychol. 39(1), 15–22 (1948). https://doi.org/10.1080/00221309.1948.9918159
Botvinick, M., Ritter, S., Wang, J.X., Kurth-Nelson, Z., Blundell, C., Hassabis, D.: Reinforcement learning, fast and slow. Trends Cogn. Sci. 23(5), 408–422 (2019). https://doi.org/10.1016/j.tics.2019.02.006
Caccia, M., et al.: Online Fast Adaptation and Knowledge Accumulation (OSAKA): a new approach to continual learning. In: Advances in Neural Information Processing Systems, vol. 33, pp. 16532–16545. Curran Associates, Inc. (2020)
Chalmers, E., Contreras, E.B., Robertson, B., Luczak, A., Gruber, A.: Context-switching and adaptation: brain-inspired mechanisms for handling environmental changes. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 3522–3529, July 2016
Chalmers, E., Luczak, A., Gruber, A.J.: Computational properties of the hippocampus increase the efficiency of goal-directed foraging through hierarchical reinforcement learning. Front. Comput. Neurosci. 10, 128 (2016)
Chernev, A.: When more is less and less is more: the role of ideal point availability and assortment in consumer choice. J. Consum. Res. 30(2), 170–183 (2003). https://doi.org/10.1086/376808
Chernev, A., Böckenholt, U., Goodman, J.: Choice overload: a conceptual review and meta-analysis. J. Consum. Psychol. 25(2), 333–358 (2015). https://doi.org/10.1016/j.jcps.2014.08.002
Dorfman, R., Shenfeld, I., Tamar, A.: Offline meta reinforcement learning – identifiability challenges and effective data collection strategies. In: Advances in Neural Information Processing Systems, vol. 34, pp. 4607–4618. Curran Associates, Inc. (2021)
Dudik, M., Langford, J., Li, L.: Doubly robust policy evaluation and learning, May 2011. https://doi.org/10.48550/ar**v.1103.4601
Fallah, A., Georgiev, K., Mokhtari, A., Ozdaglar, A.: On the convergence theory of debiased model-agnostic meta-reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 34, pp. 3096–3107. Curran Associates, Inc. (2021)
Harrison, J., Sharma, A., Finn, C., Pavone, M.: Continuous meta-learning without tasks. In: Advances in Neural Information Processing Systems, vol. 33, pp. 17571–17581. Curran Associates, Inc. (2020)
Hick, W.E.: On the rate of gain of information. Q. J. Exp. Psychol. 4(1), 11–26 (1952). https://doi.org/10.1080/17470215208416600
Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47(260), 663–685 (1952). https://doi.org/10.1080/01621459.1952.10483446
Kwon, J., Efroni, Y., Caramanis, C., Mannor, S.: Reinforcement learning in reward-mixing MDPs. In: Advances in Neural Information Processing Systems, vol. 34, pp. 2253–2264. Curran Associates, Inc. (2021)
Liu, H., Long, M., Wang, J., Wang, Y.: Learning to adapt to evolving domains. In: Advances in Neural Information Processing Systems, vol. 33, pp. 22338–22348. Curran Associates, Inc. (2020)
Luczak, A., Kubo, Y.: Predictive neuronal adaptation as a basis for consciousness. Front. Syst. Neurosci. 15, 767461 (2021). https://doi.org/10.3389/fnsys.2021.767461
Luczak, A., McNaughton, B.L., Kubo, Y.: Neurons learn by predicting future activity. Nat. Mach. Intell. 4(1), 62–72 (2022). https://doi.org/10.1038/s42256-021-00430-y
Milner, B.: Effects of different brain lesions on card sorting: the role of the frontal lobes. Arch. Neurol. 9(1), 90–100 (1963). https://doi.org/10.1001/archneur.1963.00460070100010
Neftci, E.O., Averbeck, B.B.: Reinforcement learning in artificial and biological systems. Nat. Mach. Intell. 1(3), 133–143 (2019). https://doi.org/10.1038/s42256-019-0025-4
Padakandla, S., Prabuchandran, K.J., Bhatnagar, S.: Reinforcement learning algorithm for non-stationary environments. Appl. Intell. 50(11), 3590–3606 (2020). https://doi.org/10.1007/s10489-020-01758-5
Scellier, B., Bengio, Y.: Equilibrium propagation: bridging the gap between energy-based models and backpropagation. Front. Comput. Neurosci. 11, 24 (2017)
Schwartz, B., Kliban, K.: The Paradox of Choice: Why More Is Less. Brilliance Audio, Grand Rapids, Mich., unabridged edition, April 2014
Steinke, A., Lange, F., Kopp, B.: Parallel model-based and model-free reinforcement learning for card sorting performance. Sci. Rep. 10(1), 15464 (2020). https://doi.org/10.1038/s41598-020-72407-7
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. A Bradford Book, Cambridge (1998)
Tang, Y., Kozuno, T., Rowland, M., Munos, R., Valko, M.: Unifying gradient estimators for meta-reinforcement learning via off-policy evaluation. In: Advances in Neural Information Processing Systems, vol. 34, pp. 5303–5315. Curran Associates, Inc. (2021)
Wang, J.X.: Prefrontal cortex as a meta-reinforcement learning system. Nat. Neurosci. 21(6), 860–868 (2018). https://doi.org/10.1038/s41593-018-0147-8
Wang, J.X., et al.: Learning to reinforcement learn, January 2017
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3), 229–256 (1992). https://doi.org/10.1007/BF00992696
Zhao, M., Liu, Z., Luan, S., Zhang, S., Precup, D., Bengio, Y.: A consciousness-inspired planning agent for model-based reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 34, pp. 1569–1581. Curran Associates, Inc. (2021)
Acknowledgements
This work was supported by Compute Canada, the Natural Sciences and Engineering Research Council of Canada (NSERC), and the Canadian Institutes of Health Research (CIHR) grants to Artur Luczak.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chalmers, E., Luczak, A. (2023). Reinforcement Learning with Brain-Inspired Modulation Improves Adaptation to Environmental Changes. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2023. Lecture Notes in Computer Science(), vol 14125. Springer, Cham. https://doi.org/10.1007/978-3-031-42505-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-42505-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42504-2
Online ISBN: 978-3-031-42505-9
eBook Packages: Computer ScienceComputer Science (R0)