Reinforcement Learning with Brain-Inspired Modulation Improves Adaptation to Environmental Changes

Chalmers, Eric; Luczak, Artur

doi:10.1007/978-3-031-42505-9_3

Eric Chalmers¹³ &
Artur Luczak¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14125))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

385 Accesses

Abstract

Developments in reinforcement learning (RL) have allowed algorithms to achieve impressive performance in complex, but largely static problems. In contrast, biological learning seems to value efficient adaptation to a constantly changing world. Here we build on a recently proposed model of neuronal learning that suggests neurons predict their own future activity to optimize their energy balance. That work proposed a neuronal learning rule that uses presynaptic input to modulate prediction error. Here we argue that an analogous RL rule would use action probability to modulate reward prediction error. We show that this modulation makes the agent more sensitive to negative experiences, and more careful in forming preferences: features that facilitate adaptation to change. We embed the proposed rule in both tabular and deep-Q-network RL algorithms, and find that it outperforms conventional algorithms in simple but highly-dynamic tasks. It also exhibits a “paradox of choice” effect that has been observed in humans. The new rule may encapsulate a core principle of biological intelligence; an important component of human-like learning and adaptation - with both its benefits and trade-offs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Almeida, L.B.: A learning rule for asynchronous perceptrons with feedback in a combinatorial environment. In: Artificial Neural Networks: Concept Learning, pp. 102–111. IEEE Press, January 1990
Google Scholar
Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3(Nov), 397–422 (2002)
Google Scholar
Baldi, P., Pineda, F.: Contrastive learning and neural oscillations. Neural Comput. 3(4), 526–545 (1991). https://doi.org/10.1162/neco.1991.3.4.526
Article Google Scholar
Berg, E.A.: A simple objective technique for measuring flexibility in thinking. J. Gen. Psychol. 39(1), 15–22 (1948). https://doi.org/10.1080/00221309.1948.9918159
Article Google Scholar
Botvinick, M., Ritter, S., Wang, J.X., Kurth-Nelson, Z., Blundell, C., Hassabis, D.: Reinforcement learning, fast and slow. Trends Cogn. Sci. 23(5), 408–422 (2019). https://doi.org/10.1016/j.tics.2019.02.006
Article Google Scholar
Caccia, M., et al.: Online Fast Adaptation and Knowledge Accumulation (OSAKA): a new approach to continual learning. In: Advances in Neural Information Processing Systems, vol. 33, pp. 16532–16545. Curran Associates, Inc. (2020)
Google Scholar
Chalmers, E., Contreras, E.B., Robertson, B., Luczak, A., Gruber, A.: Context-switching and adaptation: brain-inspired mechanisms for handling environmental changes. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 3522–3529, July 2016
Google Scholar
Chalmers, E., Luczak, A., Gruber, A.J.: Computational properties of the hippocampus increase the efficiency of goal-directed foraging through hierarchical reinforcement learning. Front. Comput. Neurosci. 10, 128 (2016)
Article Google Scholar
Chernev, A.: When more is less and less is more: the role of ideal point availability and assortment in consumer choice. J. Consum. Res. 30(2), 170–183 (2003). https://doi.org/10.1086/376808
Article Google Scholar
Chernev, A., Böckenholt, U., Goodman, J.: Choice overload: a conceptual review and meta-analysis. J. Consum. Psychol. 25(2), 333–358 (2015). https://doi.org/10.1016/j.jcps.2014.08.002
Article Google Scholar
Dorfman, R., Shenfeld, I., Tamar, A.: Offline meta reinforcement learning – identifiability challenges and effective data collection strategies. In: Advances in Neural Information Processing Systems, vol. 34, pp. 4607–4618. Curran Associates, Inc. (2021)
Google Scholar
Dudik, M., Langford, J., Li, L.: Doubly robust policy evaluation and learning, May 2011. https://doi.org/10.48550/ar**v.1103.4601
Fallah, A., Georgiev, K., Mokhtari, A., Ozdaglar, A.: On the convergence theory of debiased model-agnostic meta-reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 34, pp. 3096–3107. Curran Associates, Inc. (2021)
Google Scholar
Harrison, J., Sharma, A., Finn, C., Pavone, M.: Continuous meta-learning without tasks. In: Advances in Neural Information Processing Systems, vol. 33, pp. 17571–17581. Curran Associates, Inc. (2020)
Google Scholar
Hick, W.E.: On the rate of gain of information. Q. J. Exp. Psychol. 4(1), 11–26 (1952). https://doi.org/10.1080/17470215208416600
Article Google Scholar
Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47(260), 663–685 (1952). https://doi.org/10.1080/01621459.1952.10483446
Article MathSciNet MATH Google Scholar
Kwon, J., Efroni, Y., Caramanis, C., Mannor, S.: Reinforcement learning in reward-mixing MDPs. In: Advances in Neural Information Processing Systems, vol. 34, pp. 2253–2264. Curran Associates, Inc. (2021)
Google Scholar
Liu, H., Long, M., Wang, J., Wang, Y.: Learning to adapt to evolving domains. In: Advances in Neural Information Processing Systems, vol. 33, pp. 22338–22348. Curran Associates, Inc. (2020)
Google Scholar
Luczak, A., Kubo, Y.: Predictive neuronal adaptation as a basis for consciousness. Front. Syst. Neurosci. 15, 767461 (2021). https://doi.org/10.3389/fnsys.2021.767461
Article Google Scholar
Luczak, A., McNaughton, B.L., Kubo, Y.: Neurons learn by predicting future activity. Nat. Mach. Intell. 4(1), 62–72 (2022). https://doi.org/10.1038/s42256-021-00430-y
Article Google Scholar
Milner, B.: Effects of different brain lesions on card sorting: the role of the frontal lobes. Arch. Neurol. 9(1), 90–100 (1963). https://doi.org/10.1001/archneur.1963.00460070100010
Article Google Scholar
Neftci, E.O., Averbeck, B.B.: Reinforcement learning in artificial and biological systems. Nat. Mach. Intell. 1(3), 133–143 (2019). https://doi.org/10.1038/s42256-019-0025-4
Article Google Scholar
Padakandla, S., Prabuchandran, K.J., Bhatnagar, S.: Reinforcement learning algorithm for non-stationary environments. Appl. Intell. 50(11), 3590–3606 (2020). https://doi.org/10.1007/s10489-020-01758-5
Scellier, B., Bengio, Y.: Equilibrium propagation: bridging the gap between energy-based models and backpropagation. Front. Comput. Neurosci. 11, 24 (2017)
Article Google Scholar
Schwartz, B., Kliban, K.: The Paradox of Choice: Why More Is Less. Brilliance Audio, Grand Rapids, Mich., unabridged edition, April 2014
Google Scholar
Steinke, A., Lange, F., Kopp, B.: Parallel model-based and model-free reinforcement learning for card sorting performance. Sci. Rep. 10(1), 15464 (2020). https://doi.org/10.1038/s41598-020-72407-7
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. A Bradford Book, Cambridge (1998)
Google Scholar
Tang, Y., Kozuno, T., Rowland, M., Munos, R., Valko, M.: Unifying gradient estimators for meta-reinforcement learning via off-policy evaluation. In: Advances in Neural Information Processing Systems, vol. 34, pp. 5303–5315. Curran Associates, Inc. (2021)
Google Scholar
Wang, J.X.: Prefrontal cortex as a meta-reinforcement learning system. Nat. Neurosci. 21(6), 860–868 (2018). https://doi.org/10.1038/s41593-018-0147-8
Article Google Scholar
Wang, J.X., et al.: Learning to reinforcement learn, January 2017
Google Scholar
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3), 229–256 (1992). https://doi.org/10.1007/BF00992696
Article MATH Google Scholar
Zhao, M., Liu, Z., Luan, S., Zhang, S., Precup, D., Bengio, Y.: A consciousness-inspired planning agent for model-based reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 34, pp. 1569–1581. Curran Associates, Inc. (2021)
Google Scholar

Download references

Acknowledgements

This work was supported by Compute Canada, the Natural Sciences and Engineering Research Council of Canada (NSERC), and the Canadian Institutes of Health Research (CIHR) grants to Artur Luczak.

Author information

Authors and Affiliations

Mount Royal University, Calgary, AB, T3E 6K6, Canada
Eric Chalmers
University of Lethbridge, Lethbridge, AB, T1K 3M4, Canada
Artur Luczak

Authors

Eric Chalmers
View author publications
You can also search for this author in PubMed Google Scholar
Artur Luczak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eric Chalmers .

Editor information

Editors and Affiliations

Systems Research Institute of the Polish Academy of Sciences, Warsaw, Poland
Leszek Rutkowski
Częstochowa University of Technology, Częstochowa, Poland
Rafał Scherer
Częstochowa University of Technology, Częstochowa, Poland
Marcin Korytkowski
University of Alberta, Edmonton, AB, Canada
Witold Pedrycz
AGH University of Krakow, Kraków, Poland
Ryszard Tadeusiewicz
University of Louisville, Louisville, KY, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chalmers, E., Luczak, A. (2023). Reinforcement Learning with Brain-Inspired Modulation Improves Adaptation to Environmental Changes. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2023. Lecture Notes in Computer Science(), vol 14125. Springer, Cham. https://doi.org/10.1007/978-3-031-42505-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-42505-9_3
Published: 14 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42504-2
Online ISBN: 978-3-031-42505-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics