Log in

Portfolio Allocation with Dynamic Risk Preferences via Reinforcement Learning

  • Published:
Computational Economics Aims and scope Submit manuscript

Abstract

In the realm of investment, the mean–variance model serves as an efficacious method for constructing investment portfolios, as it is underpinned by a robust economic theory and is ubiquitously employed in both academia and practice. Nevertheless, there is currently no satisfactory approach for ascertaining the risk preference parameters within the model for investors. This paper proposes a novel reinforcement learning (RL) framework integrated with the mean–variance model, designed to dynamically adjust investors’ risk preference parameters during the portfolio construction process. Our RL portfolio is not only readily implementable but also exhibits strong economic interpretability. In our empirical analysis employing Taiwan 50 Index market data, our designed RL portfolio outperforms both the buy-and-hold strategy and portfolios with static risk preference parameters. Concurrently, through our meticulously crafted reward function, RL demonstrates heightened accuracy in selecting suitable risk preferences when market return differences are more pronounced, underscoring the effectiveness of RL methods in dynamically adjusting risk preference parameters during periods of elevated market volatility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. We use the past 5 days’ returns to measure the standard deviation for two reasons. First, the information from the past 5 days represents the stock market’s performance over the past week. Second, from a time series perspective, in daily trading strategies, we do not want to use information from too far in the past to measure the standard deviation, as it would include too much noise.

  2. If the number of historical return days m is less than the number of constituent stocks, it would be impossible to estimate the inverse of the covariance matrix in the mean–variance model from a statistical perspective, thereby making it impossible to calculate the portfolio weights.

  3. We also validate the performance of the maximum return portfolio and the minimum variance portfolio with different historical return days m at 120 and 240 days. The performance is similar to that of using 60 days, so in the following discussions, we only focus on the case with 60 days.

References

  • Basak, S., & Chabakauri, G. (2010). Dynamic mean-variance asset allocation. The Review of Financial Studies, 23(8), 2970–3016.

    Article  Google Scholar 

  • Björk, T., Murgoci, A., & Zhou, X. Y. (2014). Mean-variance portfolio optimization with state-dependent risk aversion. Mathematical Finance: An International Journal of Mathematics, Statistics and Financial Economics, 24(1), 1–24.

    Article  Google Scholar 

  • Díaz, A., & Esparcia, C. (2019). Assessing risk aversion from the investor’s point of view. Frontiers in Psychology, 10, 1490.

    Article  Google Scholar 

  • Huang, S. H., Miao, Y. H., & Hsiao, Y. T. (2021). Novel deep reinforcement algorithm with adaptive sampling strategy for continuous portfolio optimization. IEEE Access, 9, 77371–77385.

    Article  Google Scholar 

  • Jiang, Z., & Liang, J. (2017). Cryptocurrency portfolio management with deep reinforcement learning. In Intelligent systems conference (IntelliSys) (pp. 905–913).

  • Jiang, J., Kelly, B. T., & **u, D. (2022). (Re-) Imag (in) ing price trends. Journal of Finance.

  • Khan, Z. H., Alin, T. S., & Hussain, M. A. (2011). Price prediction of share market using artificial neural network (ANN). International Journal of Computer Applications, 22(2), 42–47.

    Article  Google Scholar 

  • Li, Y., & Li, Z. (2013). Optimal time-consistent investment and reinsurance strategies for mean-variance insurers with state dependent risk aversion. Insurance: Mathematics and Economics, 53(1), 86–97.

    Google Scholar 

  • Markowitz, H. M. (1952). Portfolio selection. The Journal of Finance, 7(1), 77–91.

    Google Scholar 

  • Moody, J., & Saffell, M. (2001). Learning to trade via direct reinforcement. IEEE Transactions on Neural Networks, 12(4), 875–889.

    Article  Google Scholar 

  • Neuneier, R. (1998). Enhancing Q-learning for optimal asset allocation. In Advances in neural information processing systems (pp. 936–942).

  • Rosenberg, J. V., & Engle, R. F. (2002). Empirical pricing kernels. Journal of Financial Economics, 64(3), 341–372.

    Article  Google Scholar 

  • Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. ar**v preprint ar**v:1707.06347

  • Zhang, Y., Wu, Y., Li, S., & Wiwatanapataphee, B. (2017). Mean-variance asset liability management with state-dependent risk aversion. North American Actuarial Journal, 21(1), 87–106.

    Article  Google Scholar 

  • Zhang, Y., Zhao, P., Li, B., Wu, Q., Huang, J., & Tan, M. (2020). Cost-sensitive portfolio selection via deep reinforcement learning. IEEE Transactions on Knowledge and Data Engineering, 34(1), 236–248.

    Google Scholar 

Download references

Funding

The authors have not disclosed any funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shih-Kuei Lin.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the author(s).

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Before using reinforcement learning to dynamically adjust the risk aversion parameter, we first observe the investment results of selecting the portfolio with higher returns each day from the maximum return portfolio and the minimum variance portfolio. Figure 9 shows the cumulative returns of correctly selecting the portfolio with higher returns under different parameters. The blue, orange, and green lines represent the models with historical state n and historical return volatility days m set at 60, 120, and 240 days, respectively. The cumulative returns are stable and positive, indicating that if reinforcement learning can correctly adjust the risk aversion parameter daily, substantial gains can be achieved. This result also validates the feasibility of our experiment. Moreover, the return of the Taiwan 50 Index is negatively correlated with the risk aversion parameter in the correct investment portfolio. The correlation coefficient of the correct investment portfolio with \(n, m=60\) is \(-0.26\), with \(n, m=120\) is \(-0.29\), and with \(n, m=240\) is \(-0.29\). Therefore, the greater the return of the Taiwan 50 Index, the smaller the risk aversion parameter should be. These results further demonstrate the need to dynamically adjust the daily risk aversion parameter.

Fig. 9
figure 9

Cumulative return of three correct portfolios. Notes The blue, orange, and green lines indicate models using the historical state n and historical return fluctuation m for 60, 120, and 240 days, respectively. The cumulative return is stable and positive, indicating that if reinforcement learning can correctly adjust the risk preference parameters every day, substantial returns can be obtained. (Color figure online)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, TF., Kuang, XJ., Liao, SL. et al. Portfolio Allocation with Dynamic Risk Preferences via Reinforcement Learning. Comput Econ (2023). https://doi.org/10.1007/s10614-023-10509-w

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10614-023-10509-w

Keywords

Navigation