PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks

Ding, Nan; Chen, **; Levinboim, Tomer; Changpinyo, Soravit; Soricut, Radu

doi:10.1007/978-3-031-19830-4_15

Nan Ding¹²,
** Chen¹²,
Tomer Levinboim¹²,
Soravit Changpinyo¹² &
…
Radu Soricut¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13694))

Included in the following conference series:

European Conference on Computer Vision

2038 Accesses
5 Citations

Abstract

With the increasing abundance of pretrained models in recent years, the problem of selecting the best pretrained checkpoint for a particular downstream classification task has been gaining increased attention. Although several methods have recently been proposed to tackle the selection problem (e.g. LEEP, H-score), these methods resort to applying heuristics that are not well motivated by learning theory. In this paper we present PACTran, a theoretically grounded family of metrics for pretrained model selection and transferability measurement. We first show how to derive PACTran metrics from the optimal PAC-Bayesian bound under the transfer learning setting. We then empirically evaluate three metric instantiations of PACTran on a number of vision tasks (VTAB) as well as a language-and-vision (OKVQA) task. An analysis of the results shows PACTran is a more consistent and effective transferability measure compared to existing selection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 93.08; Price includes VAT (Germany)

Softcover Book: EUR 117.69; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Transfer Learning: Leveraging Trained Models on Novel Tasks

Newer is Not Always Better: Rethinking Transferability Metrics, Their Peculiarities, Stability and Performance

Predicting Is Not Understanding: Recognizing and Addressing Underspecification in Machine Learning

Notes

References

Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/ software available from tensorflow.org
Antol, S., et al.: VQA: visual question answering. In: ICCV (2015)
Google Scholar
Bao, Y., et al.: An information-theoretic approach to transferability in task transfer learning. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 2309–2313. IEEE (2019)
Google Scholar
Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017)
Article MathSciNet Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Bommasani, R., et al.: On the opportunities and risks of foundation models (2021)
Google Scholar
Bousquet, O., Boucheron, S., Lugosi, G.: Introduction to statistical learning theory. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds.) ML -2003. LNCS (LNAI), vol. 3176, pp. 169–207. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28650-9_8
Chapter MATH Google Scholar
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis (2019)
Google Scholar
Brown, T., et al.: Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020)
Google Scholar
Changpinyo, S., Kukliansky, D., Szpektor, I., Chen, X., Ding, N., Soricut, R.: All you may need for VQA are image captions. In: NAACL (2022)
Google Scholar
Ding, N., Chen, X., Levinboim, T., Goodman, S., Soricut, R.: Bridging the gap between practice and PAC-bayes theory in few-shot meta-learning. Adv. Neural Inf. Process. Syst. 34, 29506–29516 (2021)
Google Scholar
Doersch, C., Gupta, A., Efros, A.A.: Unsupervised visual representation learning by context prediction (2016)
Google Scholar
Dosovitskiy, A., Springenberg, J.T., Riedmiller, M., Brox, T.: Discriminative unsupervised feature learning with convolutional neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems. vol. 27. Curran Associates, Inc. (2014)
Google Scholar
Dziugaite, G.K., Roy, D.M.: Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. ar**v preprint ar**v:1703.11008 (2017)
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: Computer Vision and Pattern Recognition Workshop (2004)
Google Scholar
Germain, P., Bach, F., Lacoste, A., Lacoste-Julien, S.: PAC-bayesian theory meets bayesian inference. Adv. Neural Inf. Process. Syst. 29, 1884–1892 (2016)
Google Scholar
Germain, P., Lacasse, A., Laviolette, F., Marchand, M.: PAC-bayesian learning of linear classifiers. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 353–360 (2009)
Google Scholar
Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations (2018)
Google Scholar
Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: elevating the role of image understanding in visual question answering. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: elevating the role of image understanding in visual question answering. In: CVPR (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
He, K., Zhang, X., Ren, S., Sun, J.: Identity map**s in deep residual networks (2016)
Google Scholar
Huang, S.L., Makur, A., Wornell, G.W., Zheng, L.: On universal features for high-dimensional learning and inference. ar**v preprint ar**v:1911.09105 (2019)
Hudson, D.A., Manning, C.D.: GQA: A new dataset for real-world visual reasoning and compositional question answering. In: CVPR (2019)
Google Scholar
Jiang, Y., Neyshabur, B., Mobahi, H., Krishnan, D., Bengio, S.: Fantastic generalization measures and where to find them. In: ICLR (2020)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2014)
Google Scholar
Krizhevsky, A.: Learning multiple layers of features from tiny images. University of Toronto, Technical Report (2009)
Google Scholar
LeCun, Y., Huang, F.J., Bottou, L.: Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. II-104 (2004)
Google Scholar
Li, Y., et al.: Ranking neural checkpoints. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2663–2673 (2021)
Google Scholar
Marino, K., Rastegari, M., Farhadi, A., Mottaghi, R.: Ok-vqa: a visual question answering benchmark requiring external knowledge. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
McAllester, D.A.: Some PAC-bayesian theorems. Mach. Learn. 37(3), 355–363 (1999)
Article Google Scholar
Neyshabur, B., Bhojanapalli, S., McAllester, D., Srebro, N.: Exploring generalization in deep learning. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Nguyen, C., Hassner, T., Seeger, M., Archambeau, C.: Leep: a new measure to evaluate transferability of learned representations. In: International Conference on Machine Learning, pp. 7294–7305. PMLR (2020)
Google Scholar
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing, December 2008
Google Scholar
Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles (2017)
Google Scholar
Parkhi, O.M., Vedaldi, A., Zisserman, A., Jawahar, C.V.: Cats and dogs. In: IEEE Conference on Computer Vision and Pattern Recognition (2012)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR (2020)
Google Scholar
Rothfuss, J., Fortuin, V., Josifoski, M., Krause, A.: Pacoh: bayes-optimal meta-learning with pac-guarantees. In: International Conference on Machine Learning, pp. 9116–9126. PMLR (2021)
Google Scholar
Rubenstein, P., Bousquet, O., Djolonga, J., Riquelme, C., Tolstikhin, I.O.: Practical and consistent estimation of f-divergences. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Sawyer-Lee, R., Gimenez, F., Hoogi, A., Rubin, D.: Curated breast imaging subset of DDSM (2016). https://doi.org/10.7937/k9/tcia.2016.7o02s9cy
Tolstikhin, I., Bousquet, O., Gelly, S., Schoelkopf, B.: Wasserstein auto-encoders (2019)
Google Scholar
Tran, A.T., Nguyen, C.V., Hassner, T.: Transferability and hardness of supervised classification tasks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1395–1405 (2019)
Google Scholar
Tripuraneni, N., Jordan, M., **, C.: On the theory of transfer learning: the importance of task diversity. Adv. Neural Inf. Process. Syst 33, 7852–7862 (2020)
Google Scholar
Tsuzuku, Y., Sato, I., Sugiyama, M.: Normalized flat minima: exploring scale invariant definition of flat minima for neural networks using PAC-Bayesian analysis. In: Proceedings of the 37th International Conference on Machine Learning, pp. 9636–9647 (2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Google Scholar
Veeling, B.S., Linmans, J., Winkens, J., Cohen, T., Welling, M.: Rotation equivariant CNNs for digital pathology (2018). https://doi.org/10.1007/978-3-030-00934-2-24
Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing. ar**v preprint ar**v:1910.03771 (2019)
**ao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3485–3492, June 2010. https://doi.org/10.1109/CVPR.2010.5539970
You, K., Liu, Y., Wang, J., Long, M.: Logme: practical assessment of pre-trained models for transfer learning. In: International Conference on Machine Learning, pp. 12133–12143. PMLR (2021)
Google Scholar
Zhai, X., Oliver, A., Kolesnikov, A., Beyer, L.: S4l: self-supervised semi-supervised learning. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1476–1485 (2019). https://doi.org/10.1109/ICCV.2019.00156
Zhai, X., et al.: A large-scale study of representation learning with the visual task adaptation benchmark. ar**v preprint ar**v:1910.04867 (2019)
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64(3), 107–115 (2021)
Article Google Scholar
Zhu, Y., Groth, O., Bernstein, M., Li, F.F.: Visual7W: grounded question answering in images. In: CVPR (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Google Research, Mountain View, USA
Nan Ding, ** Chen, Tomer Levinboim, Soravit Changpinyo & Radu Soricut

Authors

Nan Ding
View author publications
You can also search for this author in PubMed Google Scholar
** Chen
View author publications
You can also search for this author in PubMed Google Scholar
Tomer Levinboim
View author publications
You can also search for this author in PubMed Google Scholar
Soravit Changpinyo
View author publications
You can also search for this author in PubMed Google Scholar
Radu Soricut
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nan Ding .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2346 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ding, N., Chen, X., Levinboim, T., Changpinyo, S., Soricut, R. (2022). PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13694. Springer, Cham. https://doi.org/10.1007/978-3-031-19830-4_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-19830-4_15
Published: 22 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19829-8
Online ISBN: 978-3-031-19830-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Transfer Learning: Leveraging Trained Models on Novel Tasks

Newer is Not Always Better: Rethinking Transferability Metrics, Their Peculiarities, Stability and Performance

Predicting Is Not Understanding: Recognizing and Addressing Underspecification in Machine Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 2346 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Transfer Learning: Leveraging Trained Models on Novel Tasks

Newer is Not Always Better: Rethinking Transferability Metrics, Their Peculiarities, Stability and Performance

Predicting Is Not Understanding: Recognizing and Addressing Underspecification in Machine Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 2346 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation