Multi-view reinforcement learning for sequential decision-making with insufficient state information

Li, Min; Zhu, William; Wang, Shi**

doi:10.1007/s13042-023-01981-9

Multi-view reinforcement learning for sequential decision-making with insufficient state information

Original Article
Published: 24 October 2023

Volume 15, pages 1533–1552, (2024)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Min Li¹,
William Zhu¹ &
Shi** Wang²

269 Accesses
Explore all metrics

Abstract

Most reinforcement learning methods describe sequential decision-making as a Markov decision process where the effect of action is only decided by the current state. But this is reasonable only if the state is correctly defined and the state information is sufficiently observed. Thus the learning efficiency of reinforcement learning methods based on Markov decision process is limited when the state information is insufficient. Partially observable Markov decision process and history-based decision process are respectively proposed to describe sequential decision-making with insufficient state information. However, these two processes are easy to ignore the important information from the current observed state. Therefore, the learning efficiency of reinforcement learning methods based on these two processes is also limited when the state information is insufficient. In this paper, we propose a multi-view reinforcement learning method to solve this problem. The motivation is that the interaction information between the agent and its environment should be considered from the views of history, present, and future to overcome the insufficiency of state information. Based on these views, we construct a multi-view decision process to describe sequential decision-making with insufficient state information. A multi-view reinforcement learning method is proposed by combining the multi-view decision process and the actor-critic framework. In the proposed method, multi-view clustering is performed to ensure that each type of sample can be sufficiently exploited. Experiments illustrate that the proposed method is more effective than the compared state-of-the-arts. The source code can be downloaded from https://github.com/jamieliuestc/MVRL.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Efficient Reinforcement Learning Using State-Action Uncertainty with Multiple Heads

Learning in the Presence of Multiple Agents

Active Inference Successor Representations

Data availability

All data generated or analyzed during this study are available in https://github.com/jamieliuestc/MVRL.

References

Littman ML, Algorithms for sequential decision-making, Brown University, 1996
Barto AG, Sutton RS, Watkins C (1989) Learning and sequential decision making. University of Massachusetts Amherst, MA
Google Scholar
Lample G, Chaplot DS (2017) Playing fps games with deep reinforcement learning, in: Proceedings of AAAI Conference on Artificial Intelligence, San Francisco, California, USA, 2017, pp. 2140–2146
Littman M L (1994) Markov games as a framework for multi-agent reinforcement learning, in: Machine learning proceedings 1994, Elsevier, pp. 157–163
Zheng L, Fiez T, Alumbaugh Z, Chasnov B, Ratliff LJ (2022) Stackelberg actor-critic: Game-theoretic reinforcement learning algorithms, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 9217–9224
Sholeh Y, Mohammad BNS, Ali K (2016) Reinforcement learning and neural networks for multi-agent nonzero-sum games of nonlinear constrained input systems. Int J Mach Learn Cybern 7(6):967–980
Article Google Scholar
Johannink T, Bahl S, Nair A, Luo J, Kumar A, Loskyll M, Ojea JA, Solowjow E, Levine S (2019) Residual reinforcement learning for robot control, in: 2019 International Conference on Robotics and Automation, IEEE, Montreal, Canada, pp. 6023–6029
Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274
Article Google Scholar
Gui Y, Hu W, Rahmani A (2022) A reinforcement learning based artificial bee colony algorithm with application in robot path planning. Expert Syst Appl 203:117389
Article Google Scholar
Folkers A, Rick M, Büskens C (2019) Controlling an autonomous vehicle with deep reinforcement learning in, IEEE Intelligent Vehicles Symposium. IEEE, Paris France 2019:2025–2031
Google Scholar
Noaeen M, Naik A, Goodman L, Crebo J, Abrar T, Abad ZSH, Bazzan AL, Far B (2022) Reinforcement learning in urban network traffic signal control: A systematic literature review. Expert Syst Appl 199:116830
Article Google Scholar
Wurman PR, Barrett S, Kawamoto K, MacGlashan J, Subramanian K, Walsh TJ, Capobianco R, Devlic A, Eckert F, Fuchs F et al (2022) Outracing champion gran turismo drivers with deep reinforcement learning. Nature 602(7896):223–228
Article ADS CAS PubMed Google Scholar
Yang F, Liu Y, Ding X, Ma F, Cao J (2022) Asymmetric cross-modal hashing with high-level semantic similarity. Pattern Recogn 130:108823
Article Google Scholar
Yang F, Ding X, Ma F, Tong D, Cao J (2023) Edmh: efficient discrete matrix factorization hashing for multi-modal similarity retrieval. Inform Process Manage 60(3):103301
Article Google Scholar
Yang F, Ding X, Liu Y, Ma F, Cao J (2022) Scalable semantic-enhanced supervised hashing for cross-modal retrieval. Knowl-Based Syst 251:109176
Article Google Scholar
Cristescu M-C (2021) Machine learning techniques for improving the performance metrics of functional verification. Sci Technol 24(1):99–116
MathSciNet Google Scholar
Li J, Sun A, Guan Z, Cheema MA, Min G (2022) Real-time dynamic network learning for location inference modelling and computing. Neurocomputing 472:198–200
Article Google Scholar
Zamfirache IA, Precup R-E, Roman R-C, Petriu EM (2022) Policy iteration reinforcement learning-based control using a grey wolf optimizer algorithm. Inf Sci 585:162–175
Article Google Scholar
Sutton RS, Barto AG, Reinforcement learning: An introduction, MIT press, 2018
Chen X, Qu G, Tang Y, Low S, Li N (2022) Reinforcement learning for selective key applications in power systems: recent advances and future challenges. IEEE Trans Smart Grid 13:2935
Article Google Scholar
Puterman ML (1990) Markov decision processes. Handb Oper Res Manage Sci 2:331–434
MathSciNet Google Scholar
Otterlo MV, Wiering M (2012) Reinforcement learning and markov decision processes, in: Reinforcement learning, Springer, pp. 3–42
Daswani M, Sunehag P, Hutter M (2013) Q-learning for history-based reinforcement learning, in: Asian Conference on Machine Learning, Canberra, Australia, pp. 213–228
Leike J (2016) Nonparametric general reinforcement learning, Ph.D. thesis, Australian National University
Monahan GE (1982) State of the art - a survey of partially observable markov decision processes: theory, models, and algorithms. Manage Sci 28(1):1–16
Article Google Scholar
Majeed SJ, Hutter M (2018) On q-learning convergence for non-markov decision processes, in: Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, pp. 2546–2552
Bellemare MG, Ostrovski G, Guez A, Thomas P, Munos R (2016) Increasing the action gap: New operators for reinforcement learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, Arizona USA
Melo FS (2001)Convergence of q-learning: A simple proof. Instit Syst Robot, Tech Rep 1–4
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning, in: International Conference on Learning Representations, San Juan, Puerto Rico,
Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods, in: International Conference on Machine Learning, Stockholm, Sweden, pp. 1587–1596
Li M, Huang T, Zhu W (2022) Clustering experience replay for the effective exploitation in reinforcement learning. Pattern Recogn 131:108875
Article Google Scholar
Li M, Huang T, Zhu W (2021) Adaptive exploration policy for exploration-exploitation tradeoff in continuous action control optimization. Int J Mach Learn Cybern 12(12):3491–3501
Article Google Scholar
Konda VR, Tsitsiklis JN (2000) Actor-critic algorithms, in: Advances in Neural Information Processing Systems, Denver, CO, USA, pp. 1008–1014
Zhong C, Lu Z, Gursoy MC, Velipasalar S (2019) A deep actor-critic reinforcement learning framework for dynamic multichannel access. IEEE Trans Cogn Commun Netw 5(4):1125–1139
Article Google Scholar
Grondman I, Busoniu L, Lopes GA, Babuska R (2012) A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(6):1291–1307
Article Google Scholar
Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms, in: International Conference on Machine Learning, Bei**g, China, , pp. 1387–1395
Sutton RS, McAllester D, Singh S, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation, in: Advances in Neural Information Processing Systems, Denver, CO, USA, pp. 1057–1063
Hasselt HV (2010) Double q-learning, in: Advances in Neural Information Processing Systems, Vancouver, Canada, pp. 2613–2621
Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, New York, USA
Huang T, Li M, Qin X, Zhu W (2022) A cnn-based policy for optimizing continuous action control by learning state sequences. Neurocomputing 468:286–295
Article Google Scholar
Zhao J, Guan Z, Xu C, Zhao W, Chen E (2022) Charge prediction by constitutive elements matching of crimes. Proceed Thirty-First Int Joint Conf Artif Intell IJCAI 22:4517–4523
Google Scholar
Xu C, Zhao W, Zhao J, Guan Z, Song X, Li J (2022) Uncertainty-aware multiview deep learning for internet of things applications. IEEE Trans Industr Inf 19(2):1456–1466
Article Google Scholar
Xu C, Guan Z, Zhao W, Niu Y, Wang Q, Wang Z (2018) Deep multi-view concept learning., in: IJCAI, Stockholm, pp. 2898–2904
Zhao W, Xu C, Guan Z, Liu Y (2020) Multiview concept learning via deep matrix factorization. IEEE Trans Neural Netw Learn Syst 32(2):814–825
Article MathSciNet Google Scholar
Xu C, Guan Z, Zhao W, Wu H, Niu Y, Ling B (2019) Adversarial incomplete multi-view clustering. IJCAI 7:3933–3939
Google Scholar
Xu C, Liu H, Guan Z, Wu X, Tan J, Ling B (2021) Adversarial incomplete multiview subspace clustering networks. IEEE Trans Cybern 52(10):10490–10503
Article Google Scholar
Li M, Wu L, Wang J, Bou Ammar H (2019) Multi-view reinforcement learning, Advances in neural information processing systems 32 (2019)
Hu Y, Sun S, Xu X, Zhao J (2020) Attentive multi-view reinforcement learning. Int J Mach Learn Cybern 11:2461–2474
Article CAS Google Scholar
Fan J, Li W, (2022) Dribo: Robust deep reinforcement learning via multi-view information bottleneck, in: International Conference on Machine Learning, PMLR, pp. 6074–6102
Goodfellow I, Bengio Y. a Courville (2016) A, Deep learning, Vol. 1, MIT press Cambridge
Cai X, Nie F, Huang H (2013) Multi-view k-means clustering on big data, in: Proceedings of the 23th International Joint conference on artificial intelligence, Bei**g China, pp. 2598–2604
Han J, Xu J, Nie F, Li X (2020) Multi-view k-means clustering with adaptive sparse memberships and weight allocation. IEEE Trans Knowl Data Eng 34(2):816–827
Article Google Scholar
Fu L, Lin P, Vasilakos AV, Wang S (2020) An overview of recent multi-view clustering. Neurocomputing 402:148–161
Article Google Scholar
Todorov E, Erez T, Tassa Mujoco Y(2012) A physics engine for model-based control, in: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Algarve, Portugal, pp. 5026–5033
Palanisamy P (2018) Hands-On Intelligent Agents with OpenAI Gym: Your guide to develo** AI agents using deep reinforcement learning, Packt Publishing Ltd

Download references

Acknowledgements

This work is financed by The National Nature Science Foundation of China under Grant No.61772120 and No.62276065.

Author information

Authors and Affiliations

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
Min Li & William Zhu
College of Computer and Data Science, Fuzhou University, Fuzhou, China
Shi** Wang

Authors

Min Li
View author publications
You can also search for this author in PubMed Google Scholar
William Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Shi** Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shi** Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Proof of Theorem 1

Proof

$$\begin{aligned}{} & {} \Vert B^{\pi }Q_{1}(h,o,a) - B^{\pi }Q_{2}(h,o,a) \Vert _{\infty }\\{} & {} \quad = \max _{h,o,a} \left| \mathbb {E}_{a'\thicksim \pi ,o'\thicksim p^{\pi }}[r(h,o,a) + \gamma Q_{1}(h',o',a') - r(h,o,a) - \gamma Q_{2}(h',o',a')] \right| \\{} & {} \quad = \max _{h,o,a} \left| \sum _{o'\in \mathcal {O}}p(o'|h,o,a)\gamma \sum _{a'\in \mathcal {A}}\pi (a'|h',o') \left[ Q_{1}(h',o',a')-Q_{2}(h',o',a')\right] \right| \\{} & {} \quad \le \max _{h,o,a} \sum _{o'\in \mathcal {O}}p(o'|h,o,a)\gamma \sum _{a'\in \mathcal {A}}\pi (a'|h',o') \left| Q_{1}(h',o',a')-Q_{2}(h',o',a') \right| \\{} & {} \quad \le \max _{h,o,a} \sum _{o'\in \mathcal {O}}p(o'|h,o,a)\gamma \max _{\widetilde{h},\widetilde{o},\widetilde{a}} \left| Q_{1}(\widetilde{h},\widetilde{o},\widetilde{a})-Q_{2}(\widetilde{h},\widetilde{o},\widetilde{a}) \right| \\{} & {} \quad \le \gamma \max _{h,o,a}|Q_{1}(h,o,a)-Q_{2}(h,o,a)|\\{} & {} \quad = \gamma \Vert Q_{1}(h,o,a) - Q_{2}(h,o,a) \Vert _{\infty }. \end{aligned}$$

Thus $B^{\pi }$ (14) is a contraction map** in the sup-norm when MvDP is finite. According to the principle of contraction map**, $B^{\pi }$ has unique fixed point that meets $Q(h,o,a)=\mathbb {E}_{a'\thicksim \pi ,o'\thicksim p^{\pi }}[r(h,o,a)+\gamma Q(h',o',a')]$. According to Bellman equation (13), the unique fixed point is $Q^{\pi }$. $\square$

1.2 Proof of Theorem 2

Proof

$$\begin{aligned}{} & {} \quad \Vert B Q_{1}(h,o,a) - B Q_{2}(h,o,a) \Vert _{\infty }\\{} & {} \quad = \max _{h,o,a} \left| \max _{\pi } \mathbb {E}_{a'\thicksim \pi ,o'\thicksim p^{\pi }}[r(h,o,a) + \gamma Q_{1}(h',o',a')] - \max _{\pi } \mathbb {E}_{a'\thicksim \pi ,o'\thicksim p^{\pi }} [r(h,o,a) + \gamma Q_{2}(h',o',a')] \right| \\{} & {} \quad \le \max _{h,o,a} \left| \max _{\pi } \sum _{o'\in \mathcal {O}}p(o'|h,o,a)\gamma \sum _{a'\in \mathcal {A}}\pi (a'|h',o') \left| Q_{1}(h',o',a')-Q_{2}(h',o',a')\right| \right| \\{} & {} \quad \le \max _{h,o,a,\pi } \sum _{o'\in \mathcal {O}}p(o'|h,o,a)\gamma \sum _{a'\in \mathcal {A}}\pi (a'|h',o') \left| Q_{1}(h',o',a')-Q_{2}(h',o',a') \right| \\{} & {} \quad \le \max _{h,o,a} \sum _{o'\in \mathcal {O}}p(o'|h,o,a)\gamma \max _{\widetilde{h},\widetilde{o},\widetilde{a}} \left| Q_{1}(\widetilde{h},\widetilde{o},\widetilde{a})-Q_{2}(\widetilde{h},\widetilde{o},\widetilde{a}) \right| \\{} & {} \quad \le \gamma \max _{h,o,a}|Q_{1}(h,o,a)-Q_{2}(h,o,a)|\\{} & {} \quad = \gamma \Vert Q_{1}(h,o,a) - Q_{2}(h,o,a) \Vert _{\infty }. \end{aligned}$$

Therefore, B (15) is a contraction map** in the sup-norm when MvDP is finite. According to the principle of contraction map**, B has unique fixed point that meets $Q(h,o,a)=\max _{\pi }\mathbb {E}_{a'\thicksim \pi ,o'\thicksim p^{\pi }}[r(h,o,a)+\gamma Q(h',o',a')]$. According to Bellman equation (13), the unique fixed point is $Q^{\pi ^*}$. $\square$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, M., Zhu, W. & Wang, S. Multi-view reinforcement learning for sequential decision-making with insufficient state information. Int. J. Mach. Learn. & Cyber. 15, 1533–1552 (2024). https://doi.org/10.1007/s13042-023-01981-9

Download citation

Received: 12 February 2023
Accepted: 21 September 2023
Published: 24 October 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s13042-023-01981-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Multi-view reinforcement learning for sequential decision-making with insufficient state information

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Reinforcement Learning Using State-Action Uncertainty with Multiple Heads

Learning in the Presence of Multiple Agents

Active Inference Successor Representations

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 Proof of Theorem 1

Proof

1.2 Proof of Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Multi-view reinforcement learning for sequential decision-making with insufficient state information

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Reinforcement Learning Using State-Action Uncertainty with Multiple Heads

Learning in the Presence of Multiple Agents

Active Inference Successor Representations

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Proof of Theorem 1

Proof

1.2 Proof of Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation