Abstract
Unsupervised contrastive learning of sentence embedding has been a recent focus of researchers. However, issues such as unreasonable division of positive and negative samples and poor data enhancement leading to text semantic changes still exist. We propose an optimized data augmentation method that combines contrastive learning’s data augmentation with unsupervised sentence pair modelling’s distillation. Our data augmentation uses in-sentence tokens for positive examples and text similarity for negative examples, while the distillation is conducted without supervised pairs. Experimental results on the STS task show that our method achieves a Spearman correlation of 81.03%, outperforming existing STS benchmarks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Pennington, J., Socher, R., Manning, C. D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543. ACL Press, Doha (2014)
Ethayarajh, K.: How contextual are contextualized word representations. comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 55–65. ACL Press, Hong Kong (2019)
Liu, F., Jiao, Y., Massiah, J., Yilmaz, E., Havrylov, S.: Trans-Encoder: unsupervised sentence-pair modelling through self-and mutual-distillations. In: Proceedings of ICLR 2022, Louisiana (2022)
Li, B., Zhou, H., He, J., Wang, M., Yang, Y., Li, L.: On the sentence embeddings from pre-trained language models. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 9119–9130. ACL Press, Online (2020)
Yan, Y., Li, R., Wang, S., Zhang, F., Wu, W., Xu, W.: Consert: a contrastive framework for self-supervised sentence representation transfer. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp. 5065–5075. ACL Press, Online (2021)
Gao, T., Yao, X., Chen, D.: Simcse: simple contrastive learning of sentence embeddings. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, pp. 6894–6910. ACL Press, Online and Punta Cana (2021)
Seonwoo, Y., et al.: Ranking-Enhanced Unsupervised Sentence Representation Learning. ar**v preprint ar**v:2209.04333 (2022)
Su, J., Cao, J., Liu, W., Ou, Y.: Whitening sentence representations for better semantics and faster retrieval. ar**v preprint ar**v:2103.15316 (2021)
Zhou, K., Zhang, B., Zhao, W.X., Wen, J.: Debiased contrastive learning of unsupervised sentence representations. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pp. 6120–6130. ACL Press, Dublin (2022)
Wang, H., Li, Y., Huang, Z., Dou, Y., Kong, L., Shao, J.D.: SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples. ar**v preprint ar**v:2201.05979 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ding, Y., **, R., Paik, JY., Chung, TS. (2024). Unsupervised Contrastive Learning of Sentence Embeddings Through Optimized Sample Construction and Knowledge Distillation. In: Liu, F., Sadanandan, A.A., Pham, D.N., Mursanto, P., Lukose, D. (eds) PRICAI 2023: Trends in Artificial Intelligence. PRICAI 2023. Lecture Notes in Computer Science(), vol 14326. Springer, Singapore. https://doi.org/10.1007/978-981-99-7022-3_35
Download citation
DOI: https://doi.org/10.1007/978-981-99-7022-3_35
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7021-6
Online ISBN: 978-981-99-7022-3
eBook Packages: Computer ScienceComputer Science (R0)