Log in

Stimulus Selection in a Q-learning Model Using Fisher Information and Monte Carlo Simulation

  • Original Paper
  • Published:
Computational Brain & Behavior Aims and scope Submit manuscript

Abstract

Reinforcement learning models have been extensively studied for decision-making tasks with reward feedback. However, in designing an experiment to collect data for Q-learning models, the quantitative effect of a presented stimulus on the estimation precision of participant parameters has generally not been considered. That is, the lack of a mathematical framework has prevented researchers from designing an optimal experiment. To tackle this problem, this study analytically derives the Fisher information. Furthermore, this study formulates a stochastic representation of the Q-learning model, which is one of the most commonly applied reinforcement learning models. With this derivation, a two-step procedure is proposed to select the optimal stimuli in terms of estimation precision, in which low-cost Fisher information evaluation and more detailed finite-sample Monte Carlo simulation are combined. The simulation studies show that reward probability reversal leads to a high estimation precision for the learning rate parameter. By contrast, for the inverse temperature parameter, a larger difference in reward probability between options leads to higher estimation precision. These results reveal that the optimal experimental design is dependent on which trait parameters of the Q-learning model are of interest to researchers. Further, it is found that the use of undesirable stimuli in terms of trait parameter precision leads to a large bias in the correlation coefficient estimate. Based on the results, the approaches to designing experiments in the Q-learning model are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data Availability

The datasets generated and/or analyzed during the current study are available from the Open Science Framework repository at https://osf.io/msf5e/.

Code Availability

The R code used to produce the results in the five simulation studies is available from the Open Science Framework repository at https://osf.io/msf5e/.

References

Download references

Funding

This work was supported by grants from the Japan Society for the Promotion of Science (Grant Numbers 1920J22350, 18H03612).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kazuya Fujita.

Ethics declarations

Ethical Approval

Not applicable.

Consent to Publish

Not applicable.

Consent to Participate

Not applicable.

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 493 KB)

Appendix 1

Appendix 1

In this appendix, we derive the Fisher information matrix for CAT. As noted previously, the key here is to regard \({r}_{t,k}\) as a random variable. Then, the log-likelihood is given by

$$\begin{array}{c}\mathrm{log}p\left({{\varvec{r}}}_{\left({\varvec{T}}\right)},\boldsymbol{ }{{\varvec{u}}}_{\left({\varvec{T}}\right)}|{\varvec{\theta}},\boldsymbol{ }{{\varvec{z}}}_{\left({\varvec{T}}\right)}\right)={\sum }_{t=1}^{T}\left\{\mathrm{log}p\left({{\varvec{u}}}_{{\varvec{t}}}|{\varvec{\theta}},\boldsymbol{ }{{\varvec{z}}}_{{\varvec{t}}-1}\right)+\mathrm{log}p\left({{\varvec{r}}}_{{\varvec{t}}}|{{\varvec{u}}}_{{\varvec{t}}},{\varvec{\theta}},\boldsymbol{ }{{\varvec{z}}}_{{\varvec{t}}-1}\right)\right\}.\end{array}$$
(A1)

However, researchers can ignore \(\mathrm{log}p\left({{\varvec{r}}}_{\left({\varvec{T}}\right)}|{{\varvec{u}}}_{\left({\varvec{T}}\right)},{\varvec{\theta}},\boldsymbol{ }{{\varvec{z}}}_{\left({\varvec{T}}\right)}\right)\) during the calculation of the derivative and estimation because \(\mathrm{log}p\left({{\varvec{r}}}_{\left({\varvec{T}}\right)}|{{\varvec{u}}}_{\left({\varvec{T}}\right)},{\varvec{\theta}},\boldsymbol{ }{{\varvec{z}}}_{\left({\varvec{T}}\right)}\right)\) is independent of the participant parameter \({\varvec{\theta}}\). That is, it holds that

$$\begin{array}{c}\frac{\partial }{\partial{\varvec{\theta}}}\mathrm{log}p\left({{\varvec{r}}}_{\left({\varvec{T}}\right)},\boldsymbol{ }{{\varvec{u}}}_{\left({\varvec{T}}\right)}|{\varvec{\theta}},\boldsymbol{ }{{\varvec{z}}}_{\left({\varvec{T}}\right)}\right)=\frac{\partial }{\partial{\varvec{\theta}}}\mathrm{log}p\left({{\varvec{u}}}_{\left({\varvec{T}}\right)}|{\varvec{\theta}},\boldsymbol{ }{{\varvec{z}}}_{\left({\varvec{T}}\right)}\right).\end{array}$$
(A2)

Although the data-generating process of reward \({r}_{t,k}\), \(\mathrm{log}p\left({{\varvec{r}}}_{\left({\varvec{T}}\right)}|{{\varvec{u}}}_{\left({\varvec{T}}\right)},{\varvec{\theta}},\boldsymbol{ }{{\varvec{z}}}_{\left({\varvec{T}}\right)}\right)\), has often been ignored, it is modeled herein to conduct CAT (Supplement C). Considering the equations, we can derive Eqs. (17) to (20) as in the “Deriving the Fisher Information Matrix Using Obtained Observations” section.

\({\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}\right)},{{\varvec{r}}}_{\left({\varvec{t}}\right)}\right)}\left[\left(\frac{\partial }{\partial \alpha }{Q}_{t,k}\left({\varvec{\theta}}\right)\right)\left(\frac{\partial }{\partial \alpha }{Q}_{t,{k}^{^{\prime}}}\left({\varvec{\theta}}\right)\right)\right]\)\({\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}\right)},{{\varvec{r}}}_{\left({\varvec{t}}\right)}\right)}\left[\left(\frac{\partial }{\partial \alpha }{Q}_{t,k}\left({\varvec{\theta}}\right)\right){Q}_{t,{k}^{^{\prime}}}\left({\varvec{\theta}}\right)\right]\), and \({\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}\right)},{{\varvec{r}}}_{\left({\varvec{t}}\right)}\right)}\left[{Q}_{t,k}\left({\varvec{\theta}}\right){Q}_{t,{k}^{^{\prime}}}\left({\varvec{\theta}}\right)\right]\) are required to calculate Eqs. (18)–(20). From Eq. (16), \(\left(\frac{\partial }{\partial a}{Q}_{t,k}\left({\varvec{\theta}}\right)\right)\left(\frac{\partial }{\partial a}{Q}_{t,{k}^{^{\prime}}}\left({\varvec{\theta}}\right)\right)\) is given by

$$\begin{array}{c}\left(\frac{\partial }{\partial \alpha }{Q}_{t,k}\left({\varvec{\theta}}\right)\right)\left(\frac{\partial }{\partial \alpha }{Q}_{t,{k}^{^{\prime}}}\left({\varvec{\theta}}\right)\right)= \\ \left(1-\mathrm{I}\left(k\right)\alpha \right)\left(1-\mathrm{I}\left({k}^{^{\prime}}\right)\alpha \right)\left[\left(\frac{\partial }{\partial \alpha }{Q}_{t-1,k}\left({\varvec{\theta}}\right)\right)\left(\frac{\partial }{\partial \alpha }{Q}_{t-1,{k}^{^{\prime}}}\left({\varvec{\theta}}\right)\right)\right]+\\ \left(1-\mathrm{I}\left(k\right)\alpha \right)I\left({k}^{^{\prime}}\right)\left[\left(\frac{\partial }{\partial \alpha }{Q}_{t-1,k}\left({\varvec{\theta}}\right)\right)\right]{\Delta }_{t-1,{k}^{^{\prime}}}+ \\ \left(1-\mathrm{I}\left({k}^{^{\prime}}\right)\alpha \right)I\left(k\right)\left[\left(\frac{\partial }{\partial \alpha }{Q}_{t-1,{k}^{^{\prime}}}\left({\varvec{\theta}}\right)\right)\right]{\Delta }_{t-1,k}+ \\ I\left(k\right)I\left({k}^{^{\prime}}\right){\Delta }_{t-1,k}{\Delta }_{t-1,{k}^{^{\prime}}} ,\end{array}$$
(A3)

where \({\Delta }_{t,k}={r}_{t,k}-{Q}_{t,k}\left({\varvec{\theta}}\right)\), \(\mathrm{I}\left(k\right)={\mathrm{I}}_{\mathrm{A}}\left(k-1+{u}_{t-1}\right)\). \(\left(1-\mathrm{I}\left(k\right)\alpha \right)\left(1-\mathrm{I}\left({k}^{\mathrm{^{\prime}}}\right)\alpha \right)\) is dependent only on \({u}_{t-1}\), and \(\left(\frac{\partial }{\partial \alpha }{Q}_{t-1,k}\left({\varvec{\theta}}\right)\right)\left(\frac{\partial }{\partial \alpha }{Q}_{t-1,k\mathrm{^{\prime}}}\left({\varvec{\theta}}\right)\right)\) is dependent on \({{\varvec{u}}}_{\left({\varvec{t}}-2\right)},{{\varvec{r}}}_{({\varvec{t}}-2)}\) considering the updated equation. The other terms remain the same. Therefore, we can calculate expectations separately as follows:

$$\begin{array}{c}{\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}\right)},{{\varvec{r}}}_{\left({\varvec{t}}\right)}\right)}\left[\left(\frac{\partial }{\partial \alpha }{Q}_{t,k}\left({\varvec{\theta}}\right)\right)\left(\frac{\partial }{\partial \alpha }{Q}_{t,{k}^{^{\prime}}}\left({\varvec{\theta}}\right)\right)\right]= \\ {\mathrm{E}}_{{{\varvec{u}}}_{\left({\varvec{t}}-1\right)}}\left[\left(1-\mathrm{I}\left(k\right)\alpha \right)\left(1-\mathrm{I}\left({k}^{^{\prime}}\right)\alpha \right)\right] {\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}-2\right)},{{\varvec{r}}}_{\left({\varvec{t}}-1\right)}\right)}\left[\left(\frac{\partial }{\partial \alpha }{Q}_{t-1,k}\left({\varvec{\theta}}\right)\right)\left(\frac{\partial }{\partial \alpha }{Q}_{t-1,{k}^{^{\prime}}}\left({\varvec{\theta}}\right)\right)\right]+\\ {\mathrm{E}}_{{{\varvec{u}}}_{\left({\varvec{t}}-1\right)}}\left[\left(1-\mathrm{I}\left(k\right)\alpha \right)\mathrm{I}\left({k}^{^{\prime}}\right)\right] {\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}-2\right)},{{\varvec{r}}}_{\left({\varvec{t}}-1\right)}\right)}\left[\left[\left(\frac{\partial }{\partial \alpha }{Q}_{t-1,k}\left({\varvec{\theta}}\right)\right)\right]{\Delta }_{t-1,{k}^{^{\prime}}}\right]+ \\ {\mathrm{E}}_{{{\varvec{u}}}_{\left({\varvec{t}}-1\right)}}\left[\left(1-\mathrm{I}\left({k}^{^{\prime}}\right)\alpha \right)\mathrm{I}\left(k\right)\right] {\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}-2\right)},{{\varvec{r}}}_{\left({\varvec{t}}-1\right)}\right)}\left[\left[\left(\frac{\partial }{\partial \alpha }{Q}_{t-1,{k}^{^{\prime}}}\left({\varvec{\theta}}\right)\right)\right]{\Delta }_{t-1,k}\right]+ \\ {\mathrm{E}}_{{{\varvec{u}}}_{\left({\varvec{t}}-1\right)}}\left[\mathrm{I}\left(k\right)\mathrm{I}\left({k}^{^{\prime}}\right)\right] {\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}-2\right)},{{\varvec{r}}}_{\left({\varvec{t}}-1\right)}\right)}\left[{\Delta }_{t-1,k}{\Delta }_{t-1,{k}^{^{\prime}}}\right],\end{array}$$
(A4)

where

$$\begin{array}{c}{\mathrm{E}}_{{{\varvec{u}}}_{\left({\varvec{t}}-1\right)}}\left[\left(1-\mathrm{I}\left(k\right)\alpha \right)\left(1-\mathrm{I}\left({k}^{^{\prime}}\right)\alpha \right)\right]={\delta }_{k{k}^{^{\prime}}}\left\{\mathrm{E}\left[\mathrm{I}\left(k\right)\right]{\left(1-\alpha \right)}^{2}+\left(1-\mathrm{E}\left[\mathrm{I}\left(k\right)\right]\right)\right\}+\left(1-{\delta }_{k{k}^{^{\prime}}}\right)\left(1-\alpha \right),\\ {\mathrm{E}}_{{{\varvec{u}}}_{\left({\varvec{t}}-1\right)}}\left[\left(1-\mathrm{I}\left(k\right)\alpha \right)\mathrm{I}\left({k}^{^{\prime}}\right)\right]=\left(1-{\delta }_{k{k}^{^{\prime}}}\alpha \right)\mathrm{E}\left[\mathrm{I}\left(k\right)\right]\\ {\mathrm{E}}_{{{\varvec{u}}}_{\left({\varvec{t}}-1\right)}}\left[\mathrm{I}\left(k\right)\mathrm{I}\left({k}^{^{\prime}}\right)\right]={\delta }_{k{k}^{^{\prime}}}\mathrm{E}\left[\mathrm{I}\left(k\right)\right].\end{array}$$
(A5)

Note that delta function, δkk', and \(\mathrm{E}\left[\mathrm{I}\left(k\right)\right]\) are given by

$$\begin{array}{c}{\delta }_{k{k}^{^{\prime}}}=\left\{\begin{array}{c}1 , if \: k={k}^{^{\prime}}\\ 0 , if \: k\ne {k}^{^{\prime}}\end{array}\right.\\ \mathrm{E}\left[\mathrm{I}\left(k\right)\right]=\left\{\begin{array}{c}p\left({u}_{t-1}=1|{\varvec{\theta}}\right), if \: k=1\\ 1-p\left({u}_{t-1}=1|{\varvec{\theta}}\right), if \: k=2\end{array}\right.,\end{array}$$
(A6)

respectively. For example, \({\mathrm{E}}_{{{\varvec{u}}}_{\left({\varvec{t}}-1\right)}}\left[\left(1-\mathrm{I}\left(k\right)\alpha \right)\mathrm{I}\left({k}^{^{\prime}}\right)\right]\) is \(p\left({u}_{t-1}=1|{\varvec{\theta}}\right)\left(1-\alpha \right)\) if \(k={k}^{^{\prime}}=1\), \(1-p\left({u}_{t-1}=1|{\varvec{\theta}}\right)\) if \(k=1, {k}^{^{\prime}}=2\), \(p\left({u}_{t-1}=1|{\varvec{\theta}}\right)\) if \(k=2, {k}^{^{\prime}}=1\), and \(\left(1-p\left({u}_{t-1}=1|{\varvec{\theta}}\right)\right)\left(1-\alpha \right)\) if \(k={k}^{^{\prime}}=2\). Therefore, \({\mathrm{E}}_{{{\varvec{u}}}_{\left({\varvec{t}}-1\right)}}\left[\left(1-\mathrm{I}\left(k\right)\alpha \right)\mathrm{I}\left({k}^{^{\prime}}\right)\right]\) can be calculated using Eq. (A5). In the same way, \({\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}\right)},{{\varvec{r}}}_{\left({\varvec{t}}\right)}\right)}\left[\left(\frac{\partial }{\partial \alpha }{Q}_{t,k}\left({\varvec{\theta}}\right)\right){Q}_{t,{k}^{^{\prime}}}\left({\varvec{\theta}}\right)\right]\) and \({\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}\right)},{{\varvec{r}}}_{\left({\varvec{t}}\right)}\right)}\left[{Q}_{t,k}\left({\varvec{\theta}}\right){Q}_{t,{k}^{^{\prime}}}\left({\varvec{\theta}}\right)\right]\) are calculated as follows:

$$\begin{array}{c}{\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}\right)},{{\varvec{r}}}_{\left({\varvec{t}}\right)}\right)}\left[\left(\frac{\partial }{\partial \alpha }{Q}_{t,k}\left({\varvec{\theta}}\right)\right){Q}_{t,{k}^{^{\prime}}}\left({\varvec{\theta}}\right)\right]= \\ {\mathrm{E}}_{{u}_{\left(t-1\right)}}\left[1-\mathrm{I}\left(k\right)\alpha \right] {\mathrm{E}}_{\left({u}_{\left(t-2\right)},{r}_{\left(t-1\right)}\right)}\left[\left(\frac{\partial }{\partial \alpha }{Q}_{t-1,k}\left({\varvec{\theta}}\right)\right){Q}_{t-1,{k}^{^{\prime}}}\left({\varvec{\theta}}\right)\right]+ \\ {\mathrm{E}}_{{u}_{\left(t-1\right)}}\left[\left(1-\mathrm{I}\left(k\right)\right)\mathrm{I}\left({k}^{^{\prime}}\right)\right]\alpha {\mathrm{E}}_{\left({u}_{\left(t-2\right)},{r}_{\left(t-1\right)}\right)}\left[\left(\frac{\partial }{\partial \alpha }{Q}_{t-1,k}\left({\varvec{\theta}}\right)\right){\Delta }_{t-1,{k}^{^{\prime}}}\right]+\\ E\left[\mathrm{I}\left(k\right)\right] {\mathrm{E}}_{\left({u}_{\left(t-2\right)},{r}_{\left(t-1\right)}\right)}\left[{\Delta }_{t-1,k}{Q}_{t-1,k}\left({\varvec{\theta}}\right)\right]+ \\ {\mathrm{E}}_{{u}_{\left(t-1\right)}}\left[\mathrm{I}\left(k\right)\mathrm{I}\left({k}^{^{\prime}}\right)\right]\alpha {\mathrm{E}}_{\left({u}_{\left(t-2\right)},{r}_{\left(t-1\right)}\right)}\left[{\Delta }_{t-1,k}{\Delta }_{t-1,{k}^{^{\prime}}}\right],\end{array}$$
(A7)
$$\begin{array}{c}{\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}\right)},{{\varvec{r}}}_{\left({\varvec{t}}\right)}\right)}\left[{Q}_{t,k}\left({\varvec{\theta}}\right){Q}_{t,{k}^{^{\prime}}}\left({\varvec{\theta}}\right)\right]= \\ {\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}-2\right)},{{\varvec{r}}}_{\left({\varvec{t}}-1\right)}\right)}\left[{Q}_{t-1,k}\left({\varvec{\theta}}\right){Q}_{t-1,{k}^{^{\prime}}}\left({\varvec{\theta}}\right)\right]+ \\ E\left[\mathrm{I}\left({k}^{^{\prime}}\right)\right]\alpha {\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}-2\right)},{{\varvec{r}}}_{\left({\varvec{t}}-1\right)}\right)}\left[{Q}_{t-1,k}\left({\varvec{\theta}}\right){\Delta }_{t-1,{k}^{^{\prime}}}\right]+ \\ E\left[\mathrm{I}\left(k\right)\right]\alpha {\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}-2\right)},{{\varvec{r}}}_{\left({\varvec{t}}-1\right)}\right)}\left[{Q}_{t-1,{k}^{^{\prime}}}\left({\varvec{\theta}}\right){\Delta }_{t-1,k}\right]+ \\ {\mathrm{E}}_{{{\varvec{u}}}_{\left({\varvec{t}}-1\right)}}\left[\mathrm{I}\left(k\right)\mathrm{I}\left({k}^{^{\prime}}\right)\right]{\alpha }^{2} {\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}-2\right)},{{\varvec{r}}}_{\left({\varvec{t}}-1\right)}\right)}\left[{\Delta }_{t-1,k}{\Delta }_{t-1,{k}^{^{\prime}}}\right],\end{array}$$
(A8)

where \({\mathrm{E}}_{{{\varvec{u}}}_{\left({\varvec{t}}-1\right)}}\left[1-\mathrm{I}\left(k\right)\alpha \right]=\left(1-\mathrm{E}\left[\mathrm{I}\left(k\right)\right]\alpha \right)\).

Further, \({\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}\right)},{{\varvec{r}}}_{\left({\varvec{t}}\right)}\right)}\left[\frac{\partial }{\partial \alpha }{Q}_{t,k}\left({\varvec{\theta}}\right)\right]\), \({\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}\right)},{{\varvec{r}}}_{\left({\varvec{t}}\right)}\right)}\left[{Q}_{t,k}\left({\varvec{\theta}}\right)\right]\), \({\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}\right)},{{\varvec{r}}}_{\left({\varvec{t}}\right)}\right)}\left[{r}_{t, k}\right]\), and \({\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}\right)},{{\varvec{r}}}_{\left({\varvec{t}}\right)}\right)}\left[{r}_{t, k}^{2}\right]\) are required to calculate Eqs. (A4), (A7), and (A8). Considering the updated equations, Eqs. (3) and (16), \({\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}\right)},{{\varvec{r}}}_{\left({\varvec{t}}\right)}\right)}\left[{Q}_{t,k}\left({\varvec{\theta}}\right)\right]\) and \({\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}\right)},{{\varvec{r}}}_{\left({\varvec{t}}\right)}\right)}\left[\frac{\partial }{\partial \alpha }{Q}_{t,k}\left({\varvec{\theta}}\right)\right]\) are given by

$$\begin{array}{c}{\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}\right)},{{\varvec{r}}}_{\left({\varvec{t}}\right)}\right)}\left[{Q}_{t,k}\left({\varvec{\theta}}\right)\right]={\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}-2\right)},{{\varvec{r}}}_{\left({\varvec{t}}-2\right)}\right)}\left[{Q}_{t-1,k}\left({\varvec{\theta}}\right)\right]+\mathrm{E}\left[\mathrm{I}\left(k\right)\right]\alpha {\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}-2\right)},{{\varvec{r}}}_{\left({\varvec{t}}-1\right)}\right)}\left[{\Delta }_{t-1,k}\right],\\ \begin{array}{c}{\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}\right)},{{\varvec{r}}}_{\left({\varvec{t}}\right)}\right)}\left[\frac{\partial }{\partial \alpha }{Q}_{t,k}\left({\varvec{\theta}}\right)\right]= {\mathrm{E}}_{{{\varvec{u}}}_{\left({\varvec{t}}-1\right)}}\left[(1-\mathrm{I}\left(k\right)\alpha \right] {\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}-2\right)},{{\varvec{r}}}_{\left({\varvec{t}}-2\right)}\right)}\left[\frac{\partial }{\partial \alpha }{Q}_{t-1,k}\left({\varvec{\theta}}\right)\right]+\\ E\left[\mathrm{I}\left(k\right)\right] {\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}-2\right)},{{\varvec{r}}}_{\left({\varvec{t}}-1\right)}\right)}\left[{\Delta }_{t-1,k}\right],\end{array}\end{array}$$
(A9)

respectively. Considering the data-generating process of \({r}_{t,k}\), Eq. (6), \({\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}\right)},{{\varvec{r}}}_{\left({\varvec{t}}\right)}\right)}\left[{r}_{t, k}\right]\) and \({\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}\right)},{{\varvec{r}}}_{\left({\varvec{t}}\right)}\right)}\left[{r}_{t, k}^{2}\right]\) are given by

$$\begin{array}{c}{\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}\right)},{{\varvec{r}}}_{\left({\varvec{t}}\right)}\right)}\left[{r}_{t, k}\right]={Re}_{t,1}p\left({u}_{t-1}=1|{\varvec{\theta}}\right)\left(2-k\right){RP}_{t,1}+{Re}_{t,2}\left(1-p\left({u}_{t-1}=1|{\varvec{\theta}}\right)\right)\left(k-1\right){RP}_{t,2},\\ \begin{array}{c}{\mathrm{E}}_{\left({{\varvec{u}}}_{\left({\varvec{t}}\right)},{{\varvec{r}}}_{\left({\varvec{t}}\right)}\right)}\left[{r}_{t, k}^{2}\right]={Re}_{t,1}^{2}p\left({u}_{t-1}=1|{\varvec{\theta}}\right)\left(2-k\right)R{P}_{t,1}+\\ {Re}_{t,2}^{2}\left(1-p\left({u}_{t-1}=1|{\varvec{\theta}}\right)\right)\left(k-1\right){RP}_{t,2}, \end{array}\end{array}$$
(A10)

respectively. Our R code is based on the above equations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fujita, K., Okada, K. & Katahira, K. Stimulus Selection in a Q-learning Model Using Fisher Information and Monte Carlo Simulation. Comput Brain Behav 6, 262–279 (2023). https://doi.org/10.1007/s42113-022-00163-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42113-022-00163-0

Keywords

Navigation