Abstract
We propose Contrastive Goal Grou** (COGOAL), a self-supervised goal embedding algorithm for learning a well-structured latent goal space to simplify goal-conditioned reinforcement learning. Compared to conventional reconstruction-based methods such as variational autoencoder, our approach can benefit from previously learnt goals and achieve better generalizability. More specifically, we theoretically prove a sufficient condition for determining whether goals share similar optimal policies, and propose COGOAL that groups goals satisfying the condition in the latent space via contrastive learning. The learnt goal embeddings enable a fully-trained policy for a goal to reach new goals which are adjacent in the latent space. We conduct experiments on visual navigation and visual object search tasks. COGOAL significantly outperforms the baseline methods in terms of sample efficiency in the visual object search task, in which a previously learnt policy is adaptively transferred to reach new goals with fine-tuning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The conditional expectation is denoted as \(\mathbb {E}_Z\left[ X|Y\right] \) which represents the expected value of \(X\) given \(Y\) over \(Z\).
References
Agarwal, R., Machado, M.C., Castro, P.S., Bellemare, M.G.: Contrastive behavioral similarity embeddings for generalization in reinforcement learning. In: Proceedings of the ICLR (2021)
Ammirato, P., Poirson, P., Park, E., Kosecka, J., Berg, A.C.: A dataset for develo** and benchmarking active vision. In: Proceedings of the ICRA, pp. 1378–1385 (2017)
Andrychowicz, M., et al.: Hindsight experience replay. In: Proceedings of the NeurIPS, pp. 5048–5058 (2017)
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the ICML, pp. 41–48 (2009)
Bodnar, C., Hausman, K., Dulac-Arnold, G., Jonschkowski, R.: A metric space perspective on self-supervised policy adaptation. IEEE Robot. Autom. Lett. 6(3), 4329–4336 (2021)
Ermolov, A., Sebe, N.: Latent world models for intrinsically motivated exploration. In: Proceedings of the NeurIPS (2020)
Ermolov, A., Siarohin, A., Sangineto, E., Sebe, N.: Whitening for self-supervised representation learning. In: Proceedings of the ICML (2021)
Florensa, C., Held, D., Wulfmeier, M., Zhang, M., Abbeel, P.: Reverse curriculum generation for reinforcement learning. In: Proceedings of the CoRL, pp. 482–495 (2017)
Ghosh, D., Gupta, A., Levine, S.: Learning actionable representations with goal conditioned policies. In: Proceedings of the ICLR (2019)
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant map**. In: Proceedings of the CVPR, pp. 1735–1742 (2006)
Kaelbling, L.P.: Learning to achieve goals. In: Proceedings of the IJCAI, pp. 1094–1099 (1993)
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: Proceedings of the ICLR (2014)
Laskin, M., Srinivas, A., Abbeel, P.: CURL: contrastive unsupervised representations for reinforcement learning. In: Proceedings of the ICML, pp. 5639–5650 (2020)
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)
McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. Psychol. Learn. Motiv. 24, 109–165 (1989)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Mousavian, A., Toshev, A., Fiser, M., Kosecká, J., Wahid, A., Davidson, J.: Visual representations for semantic target driven navigation. In: Proceedings of the ICRA, pp. 8846–8852 (2019)
Nair, A., Pong, V., Dalal, M., Bahl, S., Lin, S., Levine, S.: Visual reinforcement learning with imagined goals. In: Proceedinngs of the NeurIPS, pp. 9209–9220 (2018)
van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. ar**v ar**v:1807.03748 (2018)
O’Searcoid, M.: Metric Spaces. Springer, London (2006). https://doi.org/10.1007/1-84628-244-6
Pong, V., Dalal, M., Lin, S., Nair, A., Bahl, S., Levine, S.: Skew-Fit: state-covering self-supervised reinforcement learning. In: Proceedings of the ICML, pp. 7783–7792 (2020)
Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: Proceedings of the ICML, pp. 1312–1320 (2015)
Stooke, A., Lee, K., Abbeel, P., Laskin, M.: Decoupling representation learning from reinforcement learning. ar**v ar**v:2009.08319 (2020)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (2018)
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge (1989)
Zhang, T., Guo, S., Tan, T., Hu, X., Chen, F.: Generating adjacency-constrained subgoals in hierarchical reinforcement learning. In: Proceedings of the NeurIPS (2020)
Zhu, Y., et al.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: Proceedings of the ICRA, pp. 3357–3364 (2017)
Acknowledgments
This work was supported by China Scholarship Council (Grant No. 202008050300).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zou, Q., Suzuki, E. (2021). Contrastive Goal Grou** for Policy Generalization in Goal-Conditioned Reinforcement Learning. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13108. Springer, Cham. https://doi.org/10.1007/978-3-030-92185-9_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-92185-9_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92184-2
Online ISBN: 978-3-030-92185-9
eBook Packages: Computer ScienceComputer Science (R0)