Contrastive Goal Grou** for Policy Generalization in Goal-Conditioned Reinforcement Learning

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13108))

Included in the following conference series:

Abstract

We propose Contrastive Goal Grou** (COGOAL), a self-supervised goal embedding algorithm for learning a well-structured latent goal space to simplify goal-conditioned reinforcement learning. Compared to conventional reconstruction-based methods such as variational autoencoder, our approach can benefit from previously learnt goals and achieve better generalizability. More specifically, we theoretically prove a sufficient condition for determining whether goals share similar optimal policies, and propose COGOAL that groups goals satisfying the condition in the latent space via contrastive learning. The learnt goal embeddings enable a fully-trained policy for a goal to reach new goals which are adjacent in the latent space. We conduct experiments on visual navigation and visual object search tasks. COGOAL significantly outperforms the baseline methods in terms of sample efficiency in the visual object search task, in which a previously learnt policy is adaptively transferred to reach new goals with fine-tuning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The conditional expectation is denoted as \(\mathbb {E}_Z\left[ X|Y\right] \) which represents the expected value of \(X\) given \(Y\) over \(Z\).

References

  1. Agarwal, R., Machado, M.C., Castro, P.S., Bellemare, M.G.: Contrastive behavioral similarity embeddings for generalization in reinforcement learning. In: Proceedings of the ICLR (2021)

    Google Scholar 

  2. Ammirato, P., Poirson, P., Park, E., Kosecka, J., Berg, A.C.: A dataset for develo** and benchmarking active vision. In: Proceedings of the ICRA, pp. 1378–1385 (2017)

    Google Scholar 

  3. Andrychowicz, M., et al.: Hindsight experience replay. In: Proceedings of the NeurIPS, pp. 5048–5058 (2017)

    Google Scholar 

  4. Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the ICML, pp. 41–48 (2009)

    Google Scholar 

  5. Bodnar, C., Hausman, K., Dulac-Arnold, G., Jonschkowski, R.: A metric space perspective on self-supervised policy adaptation. IEEE Robot. Autom. Lett. 6(3), 4329–4336 (2021)

    Article  Google Scholar 

  6. Ermolov, A., Sebe, N.: Latent world models for intrinsically motivated exploration. In: Proceedings of the NeurIPS (2020)

    Google Scholar 

  7. Ermolov, A., Siarohin, A., Sangineto, E., Sebe, N.: Whitening for self-supervised representation learning. In: Proceedings of the ICML (2021)

    Google Scholar 

  8. Florensa, C., Held, D., Wulfmeier, M., Zhang, M., Abbeel, P.: Reverse curriculum generation for reinforcement learning. In: Proceedings of the CoRL, pp. 482–495 (2017)

    Google Scholar 

  9. Ghosh, D., Gupta, A., Levine, S.: Learning actionable representations with goal conditioned policies. In: Proceedings of the ICLR (2019)

    Google Scholar 

  10. Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant map**. In: Proceedings of the CVPR, pp. 1735–1742 (2006)

    Google Scholar 

  11. Kaelbling, L.P.: Learning to achieve goals. In: Proceedings of the IJCAI, pp. 1094–1099 (1993)

    Google Scholar 

  12. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: Proceedings of the ICLR (2014)

    Google Scholar 

  13. Laskin, M., Srinivas, A., Abbeel, P.: CURL: contrastive unsupervised representations for reinforcement learning. In: Proceedings of the ICML, pp. 5639–5650 (2020)

    Google Scholar 

  14. Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)

    Google Scholar 

  15. McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. Psychol. Learn. Motiv. 24, 109–165 (1989)

    Article  Google Scholar 

  16. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  17. Mousavian, A., Toshev, A., Fiser, M., Kosecká, J., Wahid, A., Davidson, J.: Visual representations for semantic target driven navigation. In: Proceedings of the ICRA, pp. 8846–8852 (2019)

    Google Scholar 

  18. Nair, A., Pong, V., Dalal, M., Bahl, S., Lin, S., Levine, S.: Visual reinforcement learning with imagined goals. In: Proceedinngs of the NeurIPS, pp. 9209–9220 (2018)

    Google Scholar 

  19. van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. ar**v ar**v:1807.03748 (2018)

  20. O’Searcoid, M.: Metric Spaces. Springer, London (2006). https://doi.org/10.1007/1-84628-244-6

  21. Pong, V., Dalal, M., Lin, S., Nair, A., Bahl, S., Levine, S.: Skew-Fit: state-covering self-supervised reinforcement learning. In: Proceedings of the ICML, pp. 7783–7792 (2020)

    Google Scholar 

  22. Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: Proceedings of the ICML, pp. 1312–1320 (2015)

    Google Scholar 

  23. Stooke, A., Lee, K., Abbeel, P., Laskin, M.: Decoupling representation learning from reinforcement learning. ar**v ar**v:2009.08319 (2020)

  24. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (2018)

    Google Scholar 

  25. Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge (1989)

    Google Scholar 

  26. Zhang, T., Guo, S., Tan, T., Hu, X., Chen, F.: Generating adjacency-constrained subgoals in hierarchical reinforcement learning. In: Proceedings of the NeurIPS (2020)

    Google Scholar 

  27. Zhu, Y., et al.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: Proceedings of the ICRA, pp. 3357–3364 (2017)

    Google Scholar 

Download references

Acknowledgments

This work was supported by China Scholarship Council (Grant No. 202008050300).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Qiming Zou or Einoshin Suzuki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zou, Q., Suzuki, E. (2021). Contrastive Goal Grou** for Policy Generalization in Goal-Conditioned Reinforcement Learning. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13108. Springer, Cham. https://doi.org/10.1007/978-3-030-92185-9_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-92185-9_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-92184-2

  • Online ISBN: 978-3-030-92185-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation