A Broad Study of Pre-training for Domain Generalization and Adaptation

Kim, Donghyun; Wang, Kaihong; Sclaroff, Stan; Saenko, Kate

doi:10.1007/978-3-031-19827-4_36

Donghyun Kim¹³,
Kaihong Wang¹²,
Stan Sclaroff¹² &
…
Kate Saenko^12,13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13693))

Included in the following conference series:

European Conference on Computer Vision

3490 Accesses
12 Citations

Abstract

Deep models must learn robust and transferable representations in order to perform well on new domains. While domain transfer methods (e.g., domain adaptation, domain generalization) have been proposed to learn transferable representations across domains, they are typically applied to ResNet backbones pre-trained on ImageNet. Thus, existing works pay little attention to the effects of pre-training on domain transfer tasks. In this paper, we provide a broad study and in-depth analysis of pre-training for domain adaptation and generalization, namely: network architectures, size, pre-training loss, and datasets. We observe that simply using a state-of-the-art backbone outperforms existing state-of-the-art domain adaptation baselines and set new baselines on Office-Home and DomainNet improving by 10.7% and 5.5%. We hope that this work can provide more insights for future domain transfer research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Spain)

eBook: EUR 93.08; Price includes VAT (Spain)

Softcover Book: EUR 114.39; Price includes VAT (Spain)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Learning to Balance Specificity and Invariance for In and Out of Domain Generalization

Efficient dynamic domain adaptation on deep CNN

Article 02 March 2020

Efficient Attention for Domain Generalization

References

Bai, Y., Mei, J., Yuille, A.L., **e, C.: Are transformers more robust than CNNs? In: Advances in Neural Information Processing Systems (NeurIPS), vol. 34 (2021)
Google Scholar
Bao, H., Dong, L., Piao, S., Wei, F.: BEit: BERT pre-training of image transformers. In: International Conference on Learning Representations (2022). www.openreview.net/forum?id=p-BhZSz59o4
Bashkirova, D., et al.: Visda-2021 competition universal domain adaptation to improve performance on out-of-distribution data. ar**v preprint ar**v:2107.11011 (2021)
Beery, S., Cole, E., Gjoka, A.: The iwildcam 2020 competition dataset. ar**v preprint ar**v:2004.10340 (2020)
Bhushan Damodaran, B., Kellenberger, B., Flamary, R., Tuia, D., Courty, N.: Deepjdot: deep joint distribution optimal transport for unsupervised domain adaptation. In: European Conference on Computer Vision (ECCV), pp. 447–463 (2018)
Google Scholar
Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 9912–9924 (2020)
Google Scholar
Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: IEEE International Conference on Computer Vision (ICCV), pp. 9650–9660 (2021)
Google Scholar
Changpinyo, S., Sharma, P., Ding, N., Soricut, R.: Conceptual 12m: pushing web-scale image-text pre-training to recognize long-tail visual concepts. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3558–3568 (2021)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning (ICML), pp. 1597–1607. PMLR (2020)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (ICLR) (2021). www.openreview.net/forum?id=YicbFdNTTy
Dosovitskiy, A., Fischer, P., Springenberg, J.T., Riedmiller, M., Brox, T.: Discriminative unsupervised feature learning with exemplar convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 1734–1747 (2015)
Article Google Scholar
Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 2030–2096 (2016)
MathSciNet Google Scholar
Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. ar**v preprint ar**v:1803.07728 (2018)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. ar**v preprint ar**v:1412.6572 (2014)
Gulrajani, I., Lopez-Paz, D.: In search of lost domain generalization. In: International Conference on Learning Representations (2021). www.openreview.net/forum?id=lQdXeXDoWtI
He, K., Fan, H., Wu, Y., **e, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9729–9738 (2020)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Google Scholar
Hendrycks, D., Dietterich, T.: Benchmarking neural network robustness to common corruptions and perturbations. In: International Conference on Learning Representations (2019). www.openreview.net/forum?id=HJz6tiCqYm
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. ar**v preprint ar**v:1503.02531 (2015)
Hoffman, J., Tzeng, E., Donahue, J., Jia, Y., Saenko, K., Darrell, T.: One-shot adaptation of supervised deep convolutional models. In: International Conference on Learning Representations (ICLR) (2014)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141 (2018)
Google Scholar
Huang, J., Dong, Q., Gong, S., Zhu, X.: Unsupervised deep learning by neighbourhood discovery. ar**v preprint ar**v:1904.11567 (2019)
Jia, C., et al.: Scaling up visual and vision-language representation learning with noisy text supervision. In: International Conference on Machine Learning (ICML), pp. 4904–4916. PMLR (2021)
Google Scholar
**, Y., Wang, X., Long, M., Wang, J.: Minimum class confusion for versatile domain adaptation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 464–480. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_28
Kim, D., Saito, K., Oh, T.H., Plummer, B.A., Sclaroff, S., Saenko, K.: CDS: Cross-domain self-supervised pre-training. In: IEEE International Conference on Computer Vision (ICCV), pp. 9123–9132 (2021)
Google Scholar
Koh, P.W., et al.: Wilds: A benchmark of in-the-wild distribution shifts. In: International Conference on Machine Learning (ICML), pp. 5637–5664. PMLR (2021)
Google Scholar
Kornblith, S., Shlens, J., Le, Q.V.: Do better ImageNet models transfer better? In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2661–2671 (2019)
Google Scholar
Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), vol. 25 (2012)
Google Scholar
Li, J., Selvaraju, R., Gotmare, A., Joty, S., **ong, C., Hoi, S.C.H.: Align before fuse: vision and language representation learning with momentum distillation. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 34 (2021)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: Common Objects in Context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: IEEE International Conference on Computer Vision (ICCV), pp. 10012–10022 (2021)
Google Scholar
Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., **e, S.: A ConvNet for the 2020s. ar**v preprint ar**v:2201.03545 (2022)
Long, M., Cao, Z., Wang, J., Jordan, M.I.: Conditional adversarial domain adaptation. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 1640–1650 (2018)
Google Scholar
Long, M., Zhu, H., Wang, J., Jordan, M.I.: Unsupervised domain adaptation with residual transfer networks. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 136–144 (2016)
Google Scholar
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11) (2008)
Google Scholar
Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 69–84. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_5
Ordonez, V., Kulkarni, G., Berg, T.: Im2text: describing images using 1 million captioned photographs. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 24 (2011)
Google Scholar
Peng, X., Bai, Q., **a, X., Huang, Z., Saenko, K., Wang, B.: Moment matching for multi-source domain adaptation. In: IEEE International Conference on Computer Vision (ICCV), pp. 1406–1415 (2019)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning (ICML), pp. 8748–8763. PMLR (2021)
Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Google Scholar
Saenko, K., Kulis, B., Fritz, M., Darrell, T.: Adapting visual category models to new domains. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 213–226. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_16
Sagawa, S., et al.: Extending the WILDS benchmark for unsupervised adaptation. In: International Conference on Learning Representations (2022). www.openreview.net/forum?id=z7p2V6KROOV
Saito, K., Kim, D., Sclaroff, S., Darrell, T., Saenko, K.: Semi-supervised domain adaptation via minimax entropy. In: IEEE International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Saito, K., Kim, D., Sclaroff, S., Saenko, K.: Universal domain adaptation through self supervision. ar**v preprint ar**v:2002.07953 (2020)
Saito, K., Kim, D., Teterwak, P., Sclaroff, S., Darrell, T., Saenko, K.: Tune it the right way: unsupervised validation of domain adaptation via soft neighborhood density. In: IEEE International Conference on Computer Vision (ICCV), pp. 9184–9193 (2021)
Google Scholar
Saito, K., Ushiku, Y., Harada, T., Saenko, K.: Strong-weak distribution alignment for adaptive object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6956–6965 (2019)
Google Scholar
Saito, K., Watanabe, K., Ushiku, Y., Harada, T.: Maximum classifier discrepancy for unsupervised domain adaptation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3723–3732 (2018)
Google Scholar
Sharma, P., Ding, N., Goodman, S., Soricut, R.: Conceptual captions: a cleaned, hypernymed, image alt-text dataset for automatic image captioning. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2556–2565 (2018)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. ar**v preprint ar**v:1409.1556 (2014)
Sun, C., Shrivastava, A., Singh, S., Gupta, A.: Revisiting unreasonable effectiveness of data in deep learning era. In: IEEE International Conference on Computer Vision (ICCV), pp. 843–852 (2017)
Google Scholar
Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML), pp. 6105–6114. PMLR (2019)
Google Scholar
Tang, H., Chen, K., Jia, K.: Unsupervised domain adaptation via structurally regularized deep clustering. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8725–8735 (2020)
Google Scholar
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning (ICML), pp. 10347–10357. PMLR (2021)
Google Scholar
Tsai, Y.H., Sohn, K., Schulter, S., Chandraker, M.: Domain adaptation for structured output via discriminative patch representations. In: IEEE International Conference on Computer Vision (ICCV), pp. 1456–1465 (2019)
Google Scholar
Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., Darrell, T.: Deep domain confusion: maximizing for domain invariance. ar**v preprint ar**v:1412.3474 (2014)
Van Horn, G., et al.: The inaturalist species classification and detection dataset. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8769–8778 (2018)
Google Scholar
Vaswani, A., et al.: Attention is all you need 30 (2017)
Google Scholar
Venkat, N., Kundu, J.N., Singh, D., Revanur, A., et al.: Your classifier can secretly suffice multi-source domain adaptation. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 4647–4659 (2020)
Google Scholar
Venkateswara, H., Eusebio, J., Chakraborty, S., Panchanathan, S.: Deep hashing network for unsupervised domain adaptation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5018–5027 (2017)
Google Scholar
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD birds-200-2011 dataset (2011)
Google Scholar
Wang, S., Chen, X., Wang, Y., Long, M., Wang, J.: Progressive adversarial networks for fine-grained domain adaptation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9213–9222 (2020)
Google Scholar
Wu, Z., **ong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3733–3742 (2018)
Google Scholar
**e, Q., Luong, M.T., Hovy, E., Le, Q.V.: Self-training with noisy student improves imagenet classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10687–10698 (2020)
Google Scholar
**e, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1492–1500 (2017)
Google Scholar
Xu, R., Chen, Z., Zuo, W., Yan, J., Lin, L.: Deep cocktail network: multi-source unsupervised domain adaptation with category shift. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3964–3973 (2018)
Google Scholar
Xu, R., Li, G., Yang, J., Lin, L.: Larger norm more transferable: An adaptive feature norm approach for unsupervised domain adaptation. In: IEEE International Conference on Computer Vision (ICCV), pp. 1426–1435 (2019)
Google Scholar
You, K., Liu, Y., Wang, J., Long, M.: Logme: practical assessment of pre-trained models for transfer learning. In: International Conference on Machine Learning (ICML), pp. 12133–12143. PMLR (2021)
Google Scholar
Zhang, Y., Liu, T., Long, M., Jordan, M.: Bridging theory and algorithm for domain adaptation. In: International Conference on Machine Learning (ICML), pp. 7404–7413. PMLR (2019)
Google Scholar
Zhu, Y., Zhuang, F., Wang, D.: Aligning domain-specific distribution and classifier for cross-domain classification from multiple sources. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 5989–5996 (2019)
Google Scholar

Download references

Acknowledgements

This work was supported by DARPA LwLL and NSF Award No. 1535797.

Author information

Authors and Affiliations

Department of Computer Science, Boston University, Boston, USA
Kaihong Wang, Stan Sclaroff & Kate Saenko
MIT-IBM Watson AI Lab, Cambridge, USA
Donghyun Kim & Kate Saenko

Authors

Donghyun Kim
View author publications
You can also search for this author in PubMed Google Scholar
Kaihong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Stan Sclaroff
View author publications
You can also search for this author in PubMed Google Scholar
Kate Saenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Donghyun Kim .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1386 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, D., Wang, K., Sclaroff, S., Saenko, K. (2022). A Broad Study of Pre-training for Domain Generalization and Adaptation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13693. Springer, Cham. https://doi.org/10.1007/978-3-031-19827-4_36

Download citation

DOI: https://doi.org/10.1007/978-3-031-19827-4_36
Published: 02 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19826-7
Online ISBN: 978-3-031-19827-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Broad Study of Pre-training for Domain Generalization and Adaptation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Learning to Balance Specificity and Invariance for In and Out of Domain Generalization

Efficient dynamic domain adaptation on deep CNN

Efficient Attention for Domain Generalization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 1386 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Broad Study of Pre-training for Domain Generalization and Adaptation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Learning to Balance Specificity and Invariance for In and Out of Domain Generalization

Efficient dynamic domain adaptation on deep CNN

Efficient Attention for Domain Generalization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 1386 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation