Reinforcement Learning with Brain-Inspired Modulation Improves Adaptation to Environmental Changes

  • Conference paper
  • First Online:
Artificial Intelligence and Soft Computing (ICAISC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14125))

Included in the following conference series:

  • 385 Accesses

Abstract

Developments in reinforcement learning (RL) have allowed algorithms to achieve impressive performance in complex, but largely static problems. In contrast, biological learning seems to value efficient adaptation to a constantly changing world. Here we build on a recently proposed model of neuronal learning that suggests neurons predict their own future activity to optimize their energy balance. That work proposed a neuronal learning rule that uses presynaptic input to modulate prediction error. Here we argue that an analogous RL rule would use action probability to modulate reward prediction error. We show that this modulation makes the agent more sensitive to negative experiences, and more careful in forming preferences: features that facilitate adaptation to change. We embed the proposed rule in both tabular and deep-Q-network RL algorithms, and find that it outperforms conventional algorithms in simple but highly-dynamic tasks. It also exhibits a “paradox of choice” effect that has been observed in humans. The new rule may encapsulate a core principle of biological intelligence; an important component of human-like learning and adaptation - with both its benefits and trade-offs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now
Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Almeida, L.B.: A learning rule for asynchronous perceptrons with feedback in a combinatorial environment. In: Artificial Neural Networks: Concept Learning, pp. 102–111. IEEE Press, January 1990

    Google Scholar 

  2. Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3(Nov), 397–422 (2002)

    Google Scholar 

  3. Baldi, P., Pineda, F.: Contrastive learning and neural oscillations. Neural Comput. 3(4), 526–545 (1991). https://doi.org/10.1162/neco.1991.3.4.526

    Article  Google Scholar 

  4. Berg, E.A.: A simple objective technique for measuring flexibility in thinking. J. Gen. Psychol. 39(1), 15–22 (1948). https://doi.org/10.1080/00221309.1948.9918159

    Article  Google Scholar 

  5. Botvinick, M., Ritter, S., Wang, J.X., Kurth-Nelson, Z., Blundell, C., Hassabis, D.: Reinforcement learning, fast and slow. Trends Cogn. Sci. 23(5), 408–422 (2019). https://doi.org/10.1016/j.tics.2019.02.006

    Article  Google Scholar 

  6. Caccia, M., et al.: Online Fast Adaptation and Knowledge Accumulation (OSAKA): a new approach to continual learning. In: Advances in Neural Information Processing Systems, vol. 33, pp. 16532–16545. Curran Associates, Inc. (2020)

    Google Scholar 

  7. Chalmers, E., Contreras, E.B., Robertson, B., Luczak, A., Gruber, A.: Context-switching and adaptation: brain-inspired mechanisms for handling environmental changes. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 3522–3529, July 2016

    Google Scholar 

  8. Chalmers, E., Luczak, A., Gruber, A.J.: Computational properties of the hippocampus increase the efficiency of goal-directed foraging through hierarchical reinforcement learning. Front. Comput. Neurosci. 10, 128 (2016)

    Article  Google Scholar 

  9. Chernev, A.: When more is less and less is more: the role of ideal point availability and assortment in consumer choice. J. Consum. Res. 30(2), 170–183 (2003). https://doi.org/10.1086/376808

    Article  Google Scholar 

  10. Chernev, A., Böckenholt, U., Goodman, J.: Choice overload: a conceptual review and meta-analysis. J. Consum. Psychol. 25(2), 333–358 (2015). https://doi.org/10.1016/j.jcps.2014.08.002

    Article  Google Scholar 

  11. Dorfman, R., Shenfeld, I., Tamar, A.: Offline meta reinforcement learning – identifiability challenges and effective data collection strategies. In: Advances in Neural Information Processing Systems, vol. 34, pp. 4607–4618. Curran Associates, Inc. (2021)

    Google Scholar 

  12. Dudik, M., Langford, J., Li, L.: Doubly robust policy evaluation and learning, May 2011. https://doi.org/10.48550/ar**v.1103.4601

  13. Fallah, A., Georgiev, K., Mokhtari, A., Ozdaglar, A.: On the convergence theory of debiased model-agnostic meta-reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 34, pp. 3096–3107. Curran Associates, Inc. (2021)

    Google Scholar 

  14. Harrison, J., Sharma, A., Finn, C., Pavone, M.: Continuous meta-learning without tasks. In: Advances in Neural Information Processing Systems, vol. 33, pp. 17571–17581. Curran Associates, Inc. (2020)

    Google Scholar 

  15. Hick, W.E.: On the rate of gain of information. Q. J. Exp. Psychol. 4(1), 11–26 (1952). https://doi.org/10.1080/17470215208416600

    Article  Google Scholar 

  16. Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47(260), 663–685 (1952). https://doi.org/10.1080/01621459.1952.10483446

    Article  MathSciNet  MATH  Google Scholar 

  17. Kwon, J., Efroni, Y., Caramanis, C., Mannor, S.: Reinforcement learning in reward-mixing MDPs. In: Advances in Neural Information Processing Systems, vol. 34, pp. 2253–2264. Curran Associates, Inc. (2021)

    Google Scholar 

  18. Liu, H., Long, M., Wang, J., Wang, Y.: Learning to adapt to evolving domains. In: Advances in Neural Information Processing Systems, vol. 33, pp. 22338–22348. Curran Associates, Inc. (2020)

    Google Scholar 

  19. Luczak, A., Kubo, Y.: Predictive neuronal adaptation as a basis for consciousness. Front. Syst. Neurosci. 15, 767461 (2021). https://doi.org/10.3389/fnsys.2021.767461

    Article  Google Scholar 

  20. Luczak, A., McNaughton, B.L., Kubo, Y.: Neurons learn by predicting future activity. Nat. Mach. Intell. 4(1), 62–72 (2022). https://doi.org/10.1038/s42256-021-00430-y

    Article  Google Scholar 

  21. Milner, B.: Effects of different brain lesions on card sorting: the role of the frontal lobes. Arch. Neurol. 9(1), 90–100 (1963). https://doi.org/10.1001/archneur.1963.00460070100010

    Article  Google Scholar 

  22. Neftci, E.O., Averbeck, B.B.: Reinforcement learning in artificial and biological systems. Nat. Mach. Intell. 1(3), 133–143 (2019). https://doi.org/10.1038/s42256-019-0025-4

    Article  Google Scholar 

  23. Padakandla, S., Prabuchandran, K.J., Bhatnagar, S.: Reinforcement learning algorithm for non-stationary environments. Appl. Intell. 50(11), 3590–3606 (2020). https://doi.org/10.1007/s10489-020-01758-5

  24. Scellier, B., Bengio, Y.: Equilibrium propagation: bridging the gap between energy-based models and backpropagation. Front. Comput. Neurosci. 11, 24 (2017)

    Article  Google Scholar 

  25. Schwartz, B., Kliban, K.: The Paradox of Choice: Why More Is Less. Brilliance Audio, Grand Rapids, Mich., unabridged edition, April 2014

    Google Scholar 

  26. Steinke, A., Lange, F., Kopp, B.: Parallel model-based and model-free reinforcement learning for card sorting performance. Sci. Rep. 10(1), 15464 (2020). https://doi.org/10.1038/s41598-020-72407-7

    Article  Google Scholar 

  27. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. A Bradford Book, Cambridge (1998)

    Google Scholar 

  28. Tang, Y., Kozuno, T., Rowland, M., Munos, R., Valko, M.: Unifying gradient estimators for meta-reinforcement learning via off-policy evaluation. In: Advances in Neural Information Processing Systems, vol. 34, pp. 5303–5315. Curran Associates, Inc. (2021)

    Google Scholar 

  29. Wang, J.X.: Prefrontal cortex as a meta-reinforcement learning system. Nat. Neurosci. 21(6), 860–868 (2018). https://doi.org/10.1038/s41593-018-0147-8

    Article  Google Scholar 

  30. Wang, J.X., et al.: Learning to reinforcement learn, January 2017

    Google Scholar 

  31. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3), 229–256 (1992). https://doi.org/10.1007/BF00992696

    Article  MATH  Google Scholar 

  32. Zhao, M., Liu, Z., Luan, S., Zhang, S., Precup, D., Bengio, Y.: A consciousness-inspired planning agent for model-based reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 34, pp. 1569–1581. Curran Associates, Inc. (2021)

    Google Scholar 

Download references

Acknowledgements

This work was supported by Compute Canada, the Natural Sciences and Engineering Research Council of Canada (NSERC), and the Canadian Institutes of Health Research (CIHR) grants to Artur Luczak.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eric Chalmers .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chalmers, E., Luczak, A. (2023). Reinforcement Learning with Brain-Inspired Modulation Improves Adaptation to Environmental Changes. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2023. Lecture Notes in Computer Science(), vol 14125. Springer, Cham. https://doi.org/10.1007/978-3-031-42505-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-42505-9_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-42504-2

  • Online ISBN: 978-3-031-42505-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation