Learning Semantic Correspondence with Sparse Annotations

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13674))

Included in the following conference series:

Abstract

Finding dense semantic correspondence is a fundamental problem in computer vision, which remains challenging in complex scenes due to background clutter, extreme intra-class variation, and a severe lack of ground truth. In this paper, we aim to address the challenge of label sparsity in semantic correspondence by enriching supervision signals from sparse keypoint annotations. To this end, we first propose a teacher-student learning paradigm for generating dense pseudo-labels and then develop two novel strategies for denoising pseudo-labels. In particular, we use spatial priors around the sparse annotations to suppress the noisy pseudo-labels. In addition, we introduce a loss-driven dynamic label selection strategy for label denoising. We instantiate our paradigm with two variants of learning strategies: a single offline teacher setting, and mutual online teachers setting. Our approach achieves notable improvements on three challenging benchmarks for semantic correspondence and establishes the new state-of-the-art. Project page: https://shuaiyihuang.github.io/publications/SCorrSAN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 79.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 99.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Annual Conference on Learning Theory(COLT) (1998)

    Google Scholar 

  2. Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a “siamese” time delay neural network. Advances in neural information processing systems 6 (1993)

    Google Scholar 

  3. Chauhan, A.K., Krishan, P.: Moving object tracking using gaussian mixture model and optical flow. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3(4) (2013)

    Google Scholar 

  4. Chen, T., Goodfellow, I., Shlens, J.: Net2net: accelerating learning via knowledge transfer. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)

    Google Scholar 

  5. Chen, Y.C., Lin, Y.Y., Yang, M.H., Huang, J.B.: Show, match and segment: joint weakly supervised learning of semantic matching and object co-segmentation. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1 (2020)

    Google Scholar 

  6. Cho, S., Hong, S., Jeon, S., Lee, Y., Sohn, K., Kim, S.: Cats: cost aggregation transformers for visual correspondence. In: Advances in Neural Information Processing Systems (NeurIPS) (2021)

    Google Scholar 

  7. Dale, K., Johnson, M.K., Sunkavalli, K., Matusik, W., Pfister, H.: Image restoration using online photo collections. In: Proceedings of the International Conference on Computer Vision (ICCV) (2009)

    Google Scholar 

  8. Goldstein, A., Fattal, R.: Video stabilization using Epipolar geometry. ACM Trans. Graph. (TOG) 31(5), 1–10 (2012)

    Article  Google Scholar 

  9. Ham, B., Cho, M., Schmid, C., Ponce, J.: Proposal flow. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  10. Ham, B., Cho, M., Schmid, C., Ponce, J.: Proposal flow: Semantic correspondences from object proposals. IEEE Transactions on Pattern Analysis and Machine Intelligence (2018)

    Google Scholar 

  11. Han, B., et al.: Co-teaching: robust training of deep neural networks with extremely noisy labels. In: Advances in Neural Information Processing Systems (NeurIPS) (2018)

    Google Scholar 

  12. Han, K., et al.: SCNet: learning semantic correspondence. In: Proceedings of the International Conference on Computer Vision (ICCV) (2017)

    Google Scholar 

  13. He, B., Yang, X., Kang, L., Cheng, Z., Zhou, X., Shrivastava, A.: ASM-Loc: action-aware segment modeling for weakly-supervised temporal action localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

    Google Scholar 

  14. He, B., Yang, X., Wu, Z., Chen, H., Lim, S.N., Shrivastava, A.: GTA: global temporal attention for video action understanding. In: Proceedings of the British Machine Vision Conference (BMVC) (2020)

    Google Scholar 

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  16. Heo, B., Kim, J., Yun, S., Park, H., Kwak, N., Choi, J.Y.: A comprehensive overhaul of feature distillation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  17. Hinton, G., et al.: Distilling the knowledge in a neural network. ar**v preprint ar**v:1503.02531 2(7) (2015)

  18. Hong, S., Cho, S., Nam, J., Lin, S., Kim, S.: Cost aggregation with 4D convolutional Swin transformer for few-shot segmentation. ar**v preprint ar**v:2207.10866 (2022)

  19. Seo, P.H., Lee, J., Jung, D., Han, B., Cho, M.: Attentive semantic alignment with offset-aware correlation kernels. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 367–383. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_22

    Chapter  Google Scholar 

  20. Horn, B.K., Schunck, B.G.: Determining optical flow. Artif. Intell. 17(1-3), 185–203 (1981)

    Google Scholar 

  21. Huang, S., Wang, Q., He, X.: Confidence-aware adversarial learning for self-supervised semantic matching. In: Peng, Y., et al. (eds.) PRCV 2020. LNCS, vol. 12305, pp. 91–103. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60633-6_8

    Chapter  Google Scholar 

  22. Huang, S., Wang, Q., Zhang, S., Yan, S., He, X.: Dynamic context correspondence network for semantic alignment. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  23. Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: CCNet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  24. Jeon, S., Kim, S., Min, D., Sohn, K.: PARN: pyramidal affine regression networks for dense semantic correspondence. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 355–371. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_22

    Chapter  Google Scholar 

  25. Jeon, S., Min, D., Kim, S., Choe, J., Sohn, K.: Guided semantic flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 631–648. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_38

    Chapter  Google Scholar 

  26. Jeon, S., Min, D., Kim, S., Sohn, K.: Joint learning of semantic alignment and object landmark detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  27. Kim, S., Lin, S., JEON, S.R., Min, D., Sohn, K.: Recurrent transformer networks for semantic correspondence. In: Advances in Neural Information Processing Systems (NeurIPS) (2018)

    Google Scholar 

  28. Kim, S., Min, D., Ham, B., Jeon, S., Lin, S., Sohn, K.: FCSS: fully convolutional self-similarity for dense semantic correspondence. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  29. Kim, S., Min, D., Jeong, S., Kim, S., Jeon, S., Sohn, K.: Semantic attribute matching networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  30. Lan, S., et al.: DiscoBox: weakly supervised instance segmentation and semantic correspondence from box supervision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

    Google Scholar 

  31. Lee, J.Y., DeGol, J., Fragoso, V., Sinha, S.N.: Patchmatch-based neighborhood consensus for semantic correspondence. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

    Google Scholar 

  32. Lee, J., Kim, D., Ponce, J., Ham, B.: SFNet: learning object-aware semantic correspondence. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  33. Lee, J., Kim, E., Lee, Y., Kim, D., Chang, J., Choo, J.: Reference-based sketch image colorization using augmented-self reference and dense semantic correspondence. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  34. Li, H., Wu, Z., Shrivastava, A., Davis, L.S.: Rethinking pseudo labels for semi-supervised object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) (2022)

    Google Scholar 

  35. Li, S., Han, K., Costain, T.W., Howard-Jenkins, H., Prisacariu, V.: Correspondence networks with adaptive neighbourhood consensus. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  36. Li, X., Fan, D.P., Yang, F., Luo, A., Cheng, H., Liu, Z.: Probabilistic model distillation for semantic correspondence. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

    Google Scholar 

  37. Liu, C., Yuen, J., Torralba, A.: Sift flow: dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33 (2011)

    Google Scholar 

  38. Liu, Y., Zhu, L., Yamada, M., Yang, Y.: Semantic correspondence as an optimal transport problem. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  39. Min, J., Cho, M.: Convolutional hough matching networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

    Google Scholar 

  40. Min, J., Lee, J., Ponce, J., Cho, M.: Hyperpixel flow: semantic correspondence with multi-layer neural features. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  41. Min, J., Lee, J., Ponce, J., Cho, M.: Spair-71k: a large-scale benchmark for semantic correspondence. ar**v preprint ar**v:1908.10543 (2019)

  42. Min, J., Lee, J., Ponce, J., Cho, M.: Learning to compose hypercolumns for visual correspondence. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12360, pp. 346–363. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58555-6_21

    Chapter  Google Scholar 

  43. Paszke, A., et al.: Automatic differentiation in pyTorch (2017)

    Google Scholar 

  44. Rocco, I., Arandjelović, R., Sivic, J.: Convolutional neural network architecture for geometric matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  45. Rocco, I., Arandjelović, R., Sivic, J.: End-to-end weakly-supervised semantic alignment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  46. Rocco, I., Cimpoi, M., Arandjelović, R., Torii, A., Pajdla, T., Sivic, J.: Neighbourhood consensus networks. In: Advances in Neural Information Processing Systems (NeurIPS) (2018)

    Google Scholar 

  47. Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vision 47(1–3), 7–42 (2002)

    Article  Google Scholar 

  48. Sohn, K., et al.: FixMatch: simplifying semi-supervised learning with consistency and confidence. In: Advances in Neural Information Processing Systems (NeurIPS) (2020)

    Google Scholar 

  49. Taniai, T., Sinha, S.N., Sato, Y.: Joint recovery of dense correspondence and cosegmentation in two images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  50. Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in Neural Information Processing Systems (NeurIPS) (2017)

    Google Scholar 

  51. Tola, E., Lepetit, V., Fua, P.: Daisy: an efficient dense descriptor applied to wide-baseline stereo. IEEE Trans. Pattern Anal. Mach. Intell. 32(5), 815-830 (2010)

    Google Scholar 

  52. Truong, P., Danelljan, M., Timofte, R.: GLU-Net: global-local universal network for dense flow and correspondences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  53. Truong, P., Danelljan, M., Yu, F., Van Gool, L.: Probabilistic warp consistency for weakly-supervised semantic correspondences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

    Google Scholar 

  54. **e, Q., Luong, M.T., Hovy, E., Le, Q.V.: Self-training with noisy student improves imageNet classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  55. Yang, L., et al.: Deep co-training with task decomposition for semi-supervised domain adaptation. In: Proceedings of the International Conference on Computer Vision (ICCV) (2021)

    Google Scholar 

  56. Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2878-2890 (2013)

    Google Scholar 

  57. Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  58. Yue, K., Sun, M., Yuan, Y., Zhou, F., Ding, E., Xu, F.: Compact generalized non-local network. In: Advances in Neural Information Processing Systems (NeurIPS) (2018)

    Google Scholar 

  59. Zhang, S., He, X., Yan, S.: LatentGNN: learning efficient non-local relations for visual recognition. In: Proceedings of the International Conference on Machine Learning (ICML) (2019)

    Google Scholar 

  60. Zhang, Y., **ang, T., Hospedales, T.M., Lu, H.: Deep mutual learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  61. Zhao, D., Song, Z., Ji, Z., Zhao, G., Ge, W., Yu, Y.: Multi-scale matching networks for semantic correspondence. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2021)

    Google Scholar 

  62. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuaiyi Huang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 12768 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huang, S., Yang, L., He, B., Zhang, S., He, X., Shrivastava, A. (2022). Learning Semantic Correspondence with Sparse Annotations. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13674. Springer, Cham. https://doi.org/10.1007/978-3-031-19781-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19781-9_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19780-2

  • Online ISBN: 978-3-031-19781-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation