Numerical Dynamic Programming for Continuous States

  • Chapter
  • First Online:
From Shortest Paths to Reinforcement Learning

Part of the book series: EURO Advanced Tutorials on Operational Research ((EUROATOR))

  • 1290 Accesses

Abstract

In this chapter we consider discrete-time DP models featuring continuous state and action spaces. Since the value functions are infinite-dimensional objects in this setting, we need an array of numerical techniques to apply the DP principle.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We are assuming that the set of basis functions is the same for each time period. In an infinite-horizon problem we would drop the time subscript.

  2. 2.

    See Sect. 4.1.

  3. 3.

    This is consistent with geometric Brownian motion, which is the solution of the stochastic differential equation dP t = μP t dt + σP t dW t, where W t is a standard Wiener process. See, e.g., [1, Chapter 11] for details.

  4. 4.

    Gauss–Hermite quadrature formulas are a standard way to discretize a normal random variable. Since MATLAB lacks standard functions to carry out Gaussian quadrature (although MATLAB code is available on the Web), we do not pursue this approach.

  5. 5.

    One may consider a more flexible parameterized rule, adjusting decisions when the end of the planning horizon is approached. This is coherent with common sense, stating that consumption–saving behaviors for young and older people need not be the same. It may also be argued that simple rules can be more robust to modeling errors.

  6. 6.

    This script is time consuming. The reader is advised to try a smaller number of scenarios, say 100, to get a feeling.

  7. 7.

    Readers are invited to modify the script and check that the difference looks less impressive when using a logarithmic utility.

  8. 8.

    In fact, for the utility functions that we consider here, there is theoretical support for simple decision rules including a constant allocation to the risky asset. The optimal decision rules can be found for a related problem that may be solved exactly by using DP; see [11] and [12].

References

  1. Brandimarte, P.: An Introduction to Financial Markets: A Quantitative Approach. Wiley, Hoboken (2018)

    Google Scholar 

  2. Cai, Y., Judd, K.L.: Shape-preserving dynamic programming. Math. Meth. Oper. Res. 77, 407–421 (2013)

    Article  Google Scholar 

  3. Cai, Y., Judd, K.L.: Dynamic programming with Hermite approximation. Math. Meth. Oper. Res. 81, 245–267 (2015)

    Article  Google Scholar 

  4. Campbell, J.Y., Viceira, L.M.: Strategic Asset Allocation. Oxford University Press, Oxford (2002)

    Book  Google Scholar 

  5. Gaggero, M., Gnecco, G., Sanguineti, M.: Dynamic programming and value-function approximation in sequential decision problems: Error analysis and numerical results. J. Optim. Theory Appl. 156, 380–416 (2013)

    Article  Google Scholar 

  6. Grüne, L., Semmler, W.: Asset pricing with dynamic programming. Comput. Econ. 29, 233–265 (2007)

    Article  Google Scholar 

  7. Holtz, M.: Sparse Grid Quadrature in High Dimensions with Applications in Finance and Insurance. Springer, Heidelberg (2011)

    Book  Google Scholar 

  8. Judd, K.L.: Numerical Methods in Economics. MIT Press, Cambridge (1998)

    Google Scholar 

  9. Löhndorf, N.: An empirical analysis of scenario generation methods for stochastic optimization. Eur. J. Oper. Res. 255, 121–132 (2016)

    Article  Google Scholar 

  10. Mehrotra, S., Papp, D.: Generating moment matching scenarios using optimization techniques. SIAM J. Optim. 23, 963–999 (2013)

    Article  Google Scholar 

  11. Merton, R.C.: Lifetime portfolio selection under uncertainty: The continuous-time case. Rev. Econ. Stat. 51, 247–257 (1969)

    Article  Google Scholar 

  12. Merton, R.C.: Optimum consumption and portfolio rules in a continuous-time model. J. Econ. Theory 3, 373–413 (1971)

    Article  Google Scholar 

  13. Miranda, M.J., Fackler, P.L.: Applied Computational Economics and Finance. MIT Press, Cambridge (2002)

    Google Scholar 

  14. Semmler, W., Mueller, M.: A stochastic model of dynamic consumption and portfolio decisions. Comput. Econ. 48, 225–251 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Brandimarte, P. (2021). Numerical Dynamic Programming for Continuous States. In: From Shortest Paths to Reinforcement Learning. EURO Advanced Tutorials on Operational Research. Springer, Cham. https://doi.org/10.1007/978-3-030-61867-4_6

Download citation

Publish with us

Policies and ethics

Navigation