Abstract
In this paper, an improved and much stronger RNH-QL method based on RBF network and heuristic Q-learning was put forward for route searching in a larger state space. Firstly, it solves the problem of inefficiency of reinforcement learning if a given problem’s state space is increased and there is a lack of prior information on the environment. Secondly, RBF network as weight updating rule, reward sha** can give an additional feedback to the agent in some intermediate states, which will help to guide the agent towards the goal state in a more controlled fashion. Meanwhile, with the process of Q-learning, it is accessible to the underlying dynamic knowledge, instead of the need of background knowledge of an upper level RBF network. Thirdly, it improves the learning efficiency by incorporating the greedy exploitation strategy to train the neural network, which has been testified by the experimental results.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11571-017-9423-7/MediaObjects/11571_2017_9423_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11571-017-9423-7/MediaObjects/11571_2017_9423_Fig2_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11571-017-9423-7/MediaObjects/11571_2017_9423_Fig3_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11571-017-9423-7/MediaObjects/11571_2017_9423_Fig4_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11571-017-9423-7/MediaObjects/11571_2017_9423_Fig5_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11571-017-9423-7/MediaObjects/11571_2017_9423_Fig6_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11571-017-9423-7/MediaObjects/11571_2017_9423_Fig7_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11571-017-9423-7/MediaObjects/11571_2017_9423_Fig8_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11571-017-9423-7/MediaObjects/11571_2017_9423_Fig9_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11571-017-9423-7/MediaObjects/11571_2017_9423_Fig10_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11571-017-9423-7/MediaObjects/11571_2017_9423_Fig11_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11571-017-9423-7/MediaObjects/11571_2017_9423_Fig12_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11571-017-9423-7/MediaObjects/11571_2017_9423_Fig13_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11571-017-9423-7/MediaObjects/11571_2017_9423_Fig14_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11571-017-9423-7/MediaObjects/11571_2017_9423_Fig15_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11571-017-9423-7/MediaObjects/11571_2017_9423_Fig16_HTML.gif)
Similar content being viewed by others
References
Bianchi R, Ribeiro C, Costa A (2008) Accelerating autonomous learning by using heuristic selection of actions. J Heuristics 14(2):135–168
Bianchi R, Martins M, Ribeiro C et al (2014) Heuristically-accelerated multiagent reinforcement learning. IEEE Trans Cybern 44(2):252–265
Chen C, Li HX, Dong D (2008) Hybrid control for robot navigation—a hierarchical Q-learning algorithm. IEEE Robot Autom Mag 15(2):37–47
Chen C, Dong D, Li H et al (2011) Hybrid MDP based integrated hierarchical Q-learning. Sci China Inf Sci 54(11):2279–2294
Chen H, Gong Y, Hong X et al (2016) A fast adaptive tunable RBF network for nonstationary systems. IEEE Trans Cybern 46(12):2683–2692
Cruz DP, Maia RD, da Silva LA et al (2014) A bee-inspired data clustering approach to design RBF neural network classifiers. In: Distributed computing and artificial intelligence, 11th international conference. Springer International Publishing, pp 545–552
Devlin S, Kudenko D (2016) Plan-based reward sha** for multi-agent reinforcement learning. Knowl Eng Rev 31(1):44–58
Duan SK, Hu XF, Dong ZK (2015a) Memristor-based cellular nonlinear/neural network: design, analysis and applications. IEEE Trans Neural Netw Learn Syst 26(6):1202–1213
Duan SK, Wang HM, Wang LD (2015b) Impulsive effects and stability analysis on memristive neural networks with variable delays. IEEE Trans Neural Netw Learn Syst. doi:10.1109/TNNLS.2015.2497319
Ferreira L, Ribeiro C, da Costa Bianchi R (2014) Heuristically accelerated reinforcement learning modularization for multi-agent multi-objective problems. Appl Intell 41(2):551–562
Gosavi A (2014) Simulation-based optimization: parametric optimization techniques and reinforcement learning. Springer, Berlin
Grzes M, Kudenko D (2008) Plan-based reward sha** for reinforcement learning. In: 4th International IEEE conference on intelligent systems, 2008. IS’08. IEEE, vol 2, pp 10-22–10-29
Grzes M, Kudenko D (2010) Online learning of sha** rewards in reinforcement learning. Neural Netw 23(4):541–550
Gu Y, Liljenström H (2007) A neural network model of attention-modulated neurodynamics. Cogn Neurodyn 1(4):275–285
Holroyd CB, Coles M (2002) The neural basis of human error processing-reinforcement learning, dopamine, and the error-related negativity. Psychol Rev 109(4):679
Kozma R (2016) Reflections on a giant of brain science. Cogn Neurodyn 10(6):457–469
Li TS, Duan SK, Liu J et al (2015) A spintronic memristor-based neural network with radial basis function for robotic manipulator control implementation. IEEE Trans Syst Man Cybern Syst. doi:10.1109/TSMC.2015.2453138
Lin F, Shi C, Luo J (2008) Dual reinforcement learning based on bias learning. J Comput Res Dev 45(9):1455–1462
Liu Z, Zeng Q (2012) A method of heuristic reinforcement learning based on acquired path guiding knowledge. J Sichuan Univ Eng Sci Ed 44(5):136–142
Liu Y, Wang Wang R, Zhang Z et al (2010) Analysis of stability of neural network with inhibitory neurons. Cogn Neurodyn 4(1):61–68
Liu C, Xu X, Hu D (2013) Multiobjective reinforcement learning—a comprehensive overview. IEEE Trans Syst Man Cybern Syst 99(4):1–13
Millan J, Torras C (2014) Learning to avoid obstacles through reinforcement. In: Proceedings of the 8th international workshop on machine learning, pp 298–302
Minsky M (1954) Neural nets and the brain-model problem. Unpublished doctoral dissertation. Princeton University, NJ
Ng AY, Harada D, Russell S (1999) Policy invariance under reward transformations: theory and application to reward sha**. In: ICML, vol 99, pp 278–287
Ni Z, He H, Zhao D et al (2012) Reinforcement learning control based on multi-goal representation using hierarchical heuristic dynamic programming. In: The 2012 international joint conference on neural networks (IJCNN). IEEE, vol 1, no. 8
Qian Y, Yu Y, Zhou Z (2013) Sha** reward learning approach from passive sample. J Softw 24(11):2667–2675
Samson RD, Frank MJ, Fellous JM (2010) Computational models of reinforcement learning: the role of dopamine as a reward signal. Cogn Neurodyn 4(2):91–105
Sutton RS, Barto AG (1998) Introduction to reinforcement learning. MIT Press, Cambridge
Wang H, Wang Q, Lu Q et al (2013) Equilibrium analysis and phase synchronization of two coupled HR neurons with gap junction. Cogn Neurodyn 7(2):121–131
Wang HM, Duan SK, Huang TW et al (2016a) Novel stability criteria for impulsive memristive neural networks with time-varying delays. Circuits Syst Signal Process 35(11):3935–3956
Wang HM, Duan SK, Li CD et al (2016b) Globally exponential stability of delayed impulsive functional differential systems with impulse time windows. Nonlinear Dyn 84(3):1655–1665
Watkins C, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292
Zhong YP, Duan SK, Zhang FY et al (2013) An intelligent control system based on neural networks and reinforcement learning. J Southwest Univ (Natural Science Edition) 35(11):172–179
Acknowledgements
The work was supported by National Natural Science Foundation of China (Grant Nos. 61372139, 61571372, 61672436), Program for New Century Excellent Talents in University (Grant No. [2013]47), Fundamental Research Funds for the Central Universities (Grant Nos. XDJK2016D008, XDJK2016A001, XDJK2014A009), Program for Excellent Talents in scientific and technological activities for Overseas Scholars, Ministry of Personnel in China (Grant No. 2012-186).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, F., Duan, S. & Wang, L. Route searching based on neural networks and heuristic reinforcement learning. Cogn Neurodyn 11, 245–258 (2017). https://doi.org/10.1007/s11571-017-9423-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11571-017-9423-7