Log in

Route searching based on neural networks and heuristic reinforcement learning

  • Research Article
  • Published:
Cognitive Neurodynamics Aims and scope Submit manuscript

Abstract

In this paper, an improved and much stronger RNH-QL method based on RBF network and heuristic Q-learning was put forward for route searching in a larger state space. Firstly, it solves the problem of inefficiency of reinforcement learning if a given problem’s state space is increased and there is a lack of prior information on the environment. Secondly, RBF network as weight updating rule, reward sha** can give an additional feedback to the agent in some intermediate states, which will help to guide the agent towards the goal state in a more controlled fashion. Meanwhile, with the process of Q-learning, it is accessible to the underlying dynamic knowledge, instead of the need of background knowledge of an upper level RBF network. Thirdly, it improves the learning efficiency by incorporating the greedy exploitation strategy to train the neural network, which has been testified by the experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Canada)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  • Bianchi R, Ribeiro C, Costa A (2008) Accelerating autonomous learning by using heuristic selection of actions. J Heuristics 14(2):135–168

    Article  Google Scholar 

  • Bianchi R, Martins M, Ribeiro C et al (2014) Heuristically-accelerated multiagent reinforcement learning. IEEE Trans Cybern 44(2):252–265

    Article  PubMed  Google Scholar 

  • Chen C, Li HX, Dong D (2008) Hybrid control for robot navigation—a hierarchical Q-learning algorithm. IEEE Robot Autom Mag 15(2):37–47

    Article  Google Scholar 

  • Chen C, Dong D, Li H et al (2011) Hybrid MDP based integrated hierarchical Q-learning. Sci China Inf Sci 54(11):2279–2294

    Article  Google Scholar 

  • Chen H, Gong Y, Hong X et al (2016) A fast adaptive tunable RBF network for nonstationary systems. IEEE Trans Cybern 46(12):2683–2692

    Article  PubMed  Google Scholar 

  • Cruz DP, Maia RD, da Silva LA et al (2014) A bee-inspired data clustering approach to design RBF neural network classifiers. In: Distributed computing and artificial intelligence, 11th international conference. Springer International Publishing, pp 545–552

  • Devlin S, Kudenko D (2016) Plan-based reward sha** for multi-agent reinforcement learning. Knowl Eng Rev 31(1):44–58

    Article  Google Scholar 

  • Duan SK, Hu XF, Dong ZK (2015a) Memristor-based cellular nonlinear/neural network: design, analysis and applications. IEEE Trans Neural Netw Learn Syst 26(6):1202–1213

    Article  PubMed  Google Scholar 

  • Duan SK, Wang HM, Wang LD (2015b) Impulsive effects and stability analysis on memristive neural networks with variable delays. IEEE Trans Neural Netw Learn Syst. doi:10.1109/TNNLS.2015.2497319

    Google Scholar 

  • Ferreira L, Ribeiro C, da Costa Bianchi R (2014) Heuristically accelerated reinforcement learning modularization for multi-agent multi-objective problems. Appl Intell 41(2):551–562

    Article  Google Scholar 

  • Gosavi A (2014) Simulation-based optimization: parametric optimization techniques and reinforcement learning. Springer, Berlin

    Google Scholar 

  • Grzes M, Kudenko D (2008) Plan-based reward sha** for reinforcement learning. In: 4th International IEEE conference on intelligent systems, 2008. IS’08. IEEE, vol 2, pp 10-22–10-29

  • Grzes M, Kudenko D (2010) Online learning of sha** rewards in reinforcement learning. Neural Netw 23(4):541–550

    Article  PubMed  Google Scholar 

  • Gu Y, Liljenström H (2007) A neural network model of attention-modulated neurodynamics. Cogn Neurodyn 1(4):275–285

    Article  PubMed  PubMed Central  Google Scholar 

  • Holroyd CB, Coles M (2002) The neural basis of human error processing-reinforcement learning, dopamine, and the error-related negativity. Psychol Rev 109(4):679

    Article  PubMed  Google Scholar 

  • Kozma R (2016) Reflections on a giant of brain science. Cogn Neurodyn 10(6):457–469

    Article  PubMed  Google Scholar 

  • Li TS, Duan SK, Liu J et al (2015) A spintronic memristor-based neural network with radial basis function for robotic manipulator control implementation. IEEE Trans Syst Man Cybern Syst. doi:10.1109/TSMC.2015.2453138

    Google Scholar 

  • Lin F, Shi C, Luo J (2008) Dual reinforcement learning based on bias learning. J Comput Res Dev 45(9):1455–1462

    Google Scholar 

  • Liu Z, Zeng Q (2012) A method of heuristic reinforcement learning based on acquired path guiding knowledge. J Sichuan Univ Eng Sci Ed 44(5):136–142

    Google Scholar 

  • Liu Y, Wang Wang R, Zhang Z et al (2010) Analysis of stability of neural network with inhibitory neurons. Cogn Neurodyn 4(1):61–68

    Article  PubMed  Google Scholar 

  • Liu C, Xu X, Hu D (2013) Multiobjective reinforcement learning—a comprehensive overview. IEEE Trans Syst Man Cybern Syst 99(4):1–13

    Google Scholar 

  • Millan J, Torras C (2014) Learning to avoid obstacles through reinforcement. In: Proceedings of the 8th international workshop on machine learning, pp 298–302

  • Minsky M (1954) Neural nets and the brain-model problem. Unpublished doctoral dissertation. Princeton University, NJ

  • Ng AY, Harada D, Russell S (1999) Policy invariance under reward transformations: theory and application to reward sha**. In: ICML, vol 99, pp 278–287

  • Ni Z, He H, Zhao D et al (2012) Reinforcement learning control based on multi-goal representation using hierarchical heuristic dynamic programming. In: The 2012 international joint conference on neural networks (IJCNN). IEEE, vol 1, no. 8

  • Qian Y, Yu Y, Zhou Z (2013) Sha** reward learning approach from passive sample. J Softw 24(11):2667–2675

    Article  Google Scholar 

  • Samson RD, Frank MJ, Fellous JM (2010) Computational models of reinforcement learning: the role of dopamine as a reward signal. Cogn Neurodyn 4(2):91–105

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Sutton RS, Barto AG (1998) Introduction to reinforcement learning. MIT Press, Cambridge

    Google Scholar 

  • Wang H, Wang Q, Lu Q et al (2013) Equilibrium analysis and phase synchronization of two coupled HR neurons with gap junction. Cogn Neurodyn 7(2):121–131

    Article  PubMed  Google Scholar 

  • Wang HM, Duan SK, Huang TW et al (2016a) Novel stability criteria for impulsive memristive neural networks with time-varying delays. Circuits Syst Signal Process 35(11):3935–3956

    Article  Google Scholar 

  • Wang HM, Duan SK, Li CD et al (2016b) Globally exponential stability of delayed impulsive functional differential systems with impulse time windows. Nonlinear Dyn 84(3):1655–1665

    Article  Google Scholar 

  • Watkins C, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292

    Google Scholar 

  • Zhong YP, Duan SK, Zhang FY et al (2013) An intelligent control system based on neural networks and reinforcement learning. J Southwest Univ (Natural Science Edition) 35(11):172–179

    Google Scholar 

Download references

Acknowledgements

The work was supported by National Natural Science Foundation of China (Grant Nos. 61372139, 61571372, 61672436), Program for New Century Excellent Talents in University (Grant No. [2013]47), Fundamental Research Funds for the Central Universities (Grant Nos. XDJK2016D008, XDJK2016A001, XDJK2014A009), Program for Excellent Talents in scientific and technological activities for Overseas Scholars, Ministry of Personnel in China (Grant No. 2012-186).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shukai Duan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, F., Duan, S. & Wang, L. Route searching based on neural networks and heuristic reinforcement learning. Cogn Neurodyn 11, 245–258 (2017). https://doi.org/10.1007/s11571-017-9423-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11571-017-9423-7

Keywords

Navigation