Contrastive Goal Grou** for Policy Generalization in Goal-Conditioned Reinforcement Learning

Zou, Qiming; Suzuki, Einoshin

doi:10.1007/978-3-030-92185-9_20

Qiming Zou¹³ &
Einoshin Suzuki¹³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13108))

Included in the following conference series:

International Conference on Neural Information Processing

2790 Accesses
1 Citations

Abstract

We propose Contrastive Goal Grou** (COGOAL), a self-supervised goal embedding algorithm for learning a well-structured latent goal space to simplify goal-conditioned reinforcement learning. Compared to conventional reconstruction-based methods such as variational autoencoder, our approach can benefit from previously learnt goals and achieve better generalizability. More specifically, we theoretically prove a sufficient condition for determining whether goals share similar optimal policies, and propose COGOAL that groups goals satisfying the condition in the latent space via contrastive learning. The learnt goal embeddings enable a fully-trained policy for a goal to reach new goals which are adjacent in the latent space. We conduct experiments on visual navigation and visual object search tasks. COGOAL significantly outperforms the baseline methods in terms of sample efficiency in the visual object search task, in which a previously learnt policy is adaptively transferred to reach new goals with fine-tuning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Hierarchical learning from human preferences and curiosity

Article Open access 28 September 2021

Goal-driven active learning

Article Open access 16 August 2021

Structural and Compact Latent Representation Learning on Sparse Reward Environments

Notes

1.
The conditional expectation is denoted as \(\mathbb {E}_Z\left[ X|Y\right] \) which represents the expected value of \(X\) given \(Y\) over \(Z\).

References

Agarwal, R., Machado, M.C., Castro, P.S., Bellemare, M.G.: Contrastive behavioral similarity embeddings for generalization in reinforcement learning. In: Proceedings of the ICLR (2021)
Google Scholar
Ammirato, P., Poirson, P., Park, E., Kosecka, J., Berg, A.C.: A dataset for develo** and benchmarking active vision. In: Proceedings of the ICRA, pp. 1378–1385 (2017)
Google Scholar
Andrychowicz, M., et al.: Hindsight experience replay. In: Proceedings of the NeurIPS, pp. 5048–5058 (2017)
Google Scholar
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the ICML, pp. 41–48 (2009)
Google Scholar
Bodnar, C., Hausman, K., Dulac-Arnold, G., Jonschkowski, R.: A metric space perspective on self-supervised policy adaptation. IEEE Robot. Autom. Lett. 6(3), 4329–4336 (2021)
Article Google Scholar
Ermolov, A., Sebe, N.: Latent world models for intrinsically motivated exploration. In: Proceedings of the NeurIPS (2020)
Google Scholar
Ermolov, A., Siarohin, A., Sangineto, E., Sebe, N.: Whitening for self-supervised representation learning. In: Proceedings of the ICML (2021)
Google Scholar
Florensa, C., Held, D., Wulfmeier, M., Zhang, M., Abbeel, P.: Reverse curriculum generation for reinforcement learning. In: Proceedings of the CoRL, pp. 482–495 (2017)
Google Scholar
Ghosh, D., Gupta, A., Levine, S.: Learning actionable representations with goal conditioned policies. In: Proceedings of the ICLR (2019)
Google Scholar
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant map**. In: Proceedings of the CVPR, pp. 1735–1742 (2006)
Google Scholar
Kaelbling, L.P.: Learning to achieve goals. In: Proceedings of the IJCAI, pp. 1094–1099 (1993)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: Proceedings of the ICLR (2014)
Google Scholar
Laskin, M., Srinivas, A., Abbeel, P.: CURL: contrastive unsupervised representations for reinforcement learning. In: Proceedings of the ICML, pp. 5639–5650 (2020)
Google Scholar
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)
Google Scholar
McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. Psychol. Learn. Motiv. 24, 109–165 (1989)
Article Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Mousavian, A., Toshev, A., Fiser, M., Kosecká, J., Wahid, A., Davidson, J.: Visual representations for semantic target driven navigation. In: Proceedings of the ICRA, pp. 8846–8852 (2019)
Google Scholar
Nair, A., Pong, V., Dalal, M., Bahl, S., Lin, S., Levine, S.: Visual reinforcement learning with imagined goals. In: Proceedinngs of the NeurIPS, pp. 9209–9220 (2018)
Google Scholar
van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. ar**v ar**v:1807.03748 (2018)
O’Searcoid, M.: Metric Spaces. Springer, London (2006). https://doi.org/10.1007/1-84628-244-6
Pong, V., Dalal, M., Lin, S., Nair, A., Bahl, S., Levine, S.: Skew-Fit: state-covering self-supervised reinforcement learning. In: Proceedings of the ICML, pp. 7783–7792 (2020)
Google Scholar
Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: Proceedings of the ICML, pp. 1312–1320 (2015)
Google Scholar
Stooke, A., Lee, K., Abbeel, P., Laskin, M.: Decoupling representation learning from reinforcement learning. ar**v ar**v:2009.08319 (2020)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (2018)
Google Scholar
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge (1989)
Google Scholar
Zhang, T., Guo, S., Tan, T., Hu, X., Chen, F.: Generating adjacency-constrained subgoals in hierarchical reinforcement learning. In: Proceedings of the NeurIPS (2020)
Google Scholar
Zhu, Y., et al.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: Proceedings of the ICRA, pp. 3357–3364 (2017)
Google Scholar

Download references

Acknowledgments

This work was supported by China Scholarship Council (Grant No. 202008050300).

Author information

Authors and Affiliations

Kyushu University, Fukuoka, Japan
Qiming Zou & Einoshin Suzuki

Authors

Qiming Zou
View author publications
You can also search for this author in PubMed Google Scholar
Einoshin Suzuki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Qiming Zou or Einoshin Suzuki .

Editor information

Editors and Affiliations

Sampoerna University, Jakarta, Indonesia
Teddy Mantoro
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee
Sampoerna University, Jakarta, Indonesia
Media Anugerah Ayu
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Universitas Indonesia, Depok, Indonesia
Achmad Nizar Hidayanto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zou, Q., Suzuki, E. (2021). Contrastive Goal Grou** for Policy Generalization in Goal-Conditioned Reinforcement Learning. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13108. Springer, Cham. https://doi.org/10.1007/978-3-030-92185-9_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-92185-9_20
Published: 06 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92184-2
Online ISBN: 978-3-030-92185-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Contrastive Goal Grou** for Policy Generalization in Goal-Conditioned Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Hierarchical learning from human preferences and curiosity

Goal-driven active learning

Structural and Compact Latent Representation Learning on Sparse Reward Environments

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Contrastive Goal Grou** for Policy Generalization in Goal-Conditioned Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Hierarchical learning from human preferences and curiosity

Goal-driven active learning

Structural and Compact Latent Representation Learning on Sparse Reward Environments

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation