Abstract
Finding optimal policies for general partially observable Markov decision processes (POMDPs) is computationally difficult primarily due to the need to perform dynamic-programming (DP) updates over the entire belief space. In this paper, we first study a somewhat restrictive class of special POMDPs called almost discernible POMDPs and propose an anytime algorithm called space-progressive value iteration(SPVI). SPVI does not perform DP updates over the entire belief space. Rather it restricts DP updates to a belief subspace that grows over time. It is argued that given sufficient time SPVI can find near-optimal policies for almost discernible POMDPs. We then show how SPVI can be applied to more a general class of POMDPs. Empirical results are presented to show the effectiveness of SPVI.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
K. J. Astrom. Optimal control of Markov decision processes with incomplete state estimation. Journal of Mathematical Analysis and Applications, 10, 174–205, 1965.
A. R. Cassandra, M. L. Littman, and N. L. Zhang. Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes. Proceedings of Thirteenth Conference on Uncertainty in Artificial Intelligence, 54–61, 1997.
T. Dean, L. P. Kaelbling, J. Kirman and A. Nicholson. Planning under time constraints in stochastic domains. Artificial Intelligence, Vol 76, Num 1–2, 35–74, 1995.
E. A. Hansen. Finite-memory controls of partially observable systems. PhD thesis, Depart of Computer Science, University of Massachusetts at Amherst, 1998.
M. Hauskrecht. Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 13, 33–94, 2000.
L. P. Kaelbling, M. L. Littman and A. R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99–134, 1998.
M. L. Littman, A. R. Cassandra and L. P. Kaelbling. Learning policies for partially observable environments: scaling up. Proceedings of the Twelfth International Conference on Machine Learning, 263–370, 1995.
C. H. Papadimitriou and J. N. Tsitsiklis(1987). The complexity of Markov decision processes. Mathematics of Operations Research, 12(3), 441–450, 1987.
E. J. Sondik. The optimal control of partially observable Markov processes. PhD thesis, Stanford University, 1971.
N. L. Zhang and W. Liu. A model approximation scheme for planning in stochastic domains. Journal of Artificial Intelligence Research, 7, 199–230, 1997.
N. L. Zhang and W. Zhang. Speeding up the convergence of value iteration in partially observable Markov decision processes. Journal of Artificial Intelligence Research, 14, 29–51, 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, N.L., Zhang, W. (2001). Space-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs. In: Benferhat, S., Besnard, P. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2001. Lecture Notes in Computer Science(), vol 2143. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44652-4_8
Download citation
DOI: https://doi.org/10.1007/3-540-44652-4_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42464-2
Online ISBN: 978-3-540-44652-1
eBook Packages: Springer Book Archive