Overall-Distinctive GCN for Social Relation Recognition on Videos

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13833))

Included in the following conference series:

  • 1548 Accesses

Abstract

Recognizing social relationships between multiple characters from videos can enable intelligent systems to serve human society better. Previous studies mainly focus on the still image to classify the relationships while ignoring the important data source of the video. With the prosperity of multimedia, the methods of video-based social relationship recognition gradually emerge. However, those methods either only focus on the logical reasoning between multiple characters or only on the direct interaction in each character pair. To that end, inspired by the rules of interpersonal social communication, we propose Overall-Distinctive GCN (OD-GCN) to recognize the relationships of multiple characters in the videos. Specifically, we first construct an overall-level character heterogeneous graph with two types of edges and rich node representation features to capture the implicit relationship of all characters. Then, we design an attention module to find mentioned nodes for each character pair from feature sequences fused with temporal information. Further, we build distinctive-level graphs to focus on the interaction between two characters. Finally, we integrate multimodal global features to classify relationships. We conduct the experiments on the MovieGraphs dataset and validate the effectiveness of our proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818 (2018)

    Google Scholar 

  2. Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4690–4699 (2019)

    Google Scholar 

  3. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. ar**v preprint ar**v:1810.04805 (2018)

  4. Everingham, M., Eslami, S., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)

    Article  Google Scholar 

  5. Goel, A., Ma, K.T., Tan, C.: An end-to-end network for generating social relationship graphs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11186–11195 (2019)

    Google Scholar 

  6. Hara, K., Kataoka, H., Satoh, Y.: Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and imagenet? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6546–6555 (2018)

    Google Scholar 

  7. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  8. Kukleva, A., Tapaswi, M., Laptev, I.: Learning interactions and relationships between movie characters. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9849–9858 (2020)

    Google Scholar 

  9. Li, J., Wong, Y., Zhao, Q., Kankanhalli, M.S.: Dual-glance model for deciphering social relationships. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2650–2659 (2017)

    Google Scholar 

  10. Li, J., Wong, Y., Zhao, Q., Kankanhalli, M.S.: Visual social relationship recognition. Int. J. Comput. Vis. 128(6), 1750–1764 (2020)

    Article  Google Scholar 

  11. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

    Google Scholar 

  12. Liu, X., et al.: Social relation recognition from videos via multi-scale spatial-temporal reasoning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3566–3574 (2019)

    Google Scholar 

  13. Liu, Z., Hou, W., Zhang, J., Cao, C., Wu, B.: A multimodal approach for multiple-relation extraction in videos. Multimedia Tools Appl. 81(4), 4909–4934 (2022)

    Article  Google Scholar 

  14. Teng, Y., Song, C., Wu, B.: Toward jointly understanding social relationships and characters from videos. Appl. Intell. 52(5), 5633–5645 (2022)

    Article  Google Scholar 

  15. Vicol, P., Tapaswi, M., Castrejon, L., Fidler, S.: MovieGraphs: towards understanding human-centric situations from videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8581–8590 (2018)

    Google Scholar 

  16. Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2

    Chapter  Google Scholar 

  17. Wu, S., Chen, J., Xu, T., Chen, L., Wu, L., Hu, Y., Chen, E.: Linking the characters: video-oriented social graph generation via hierarchical-cumulative GCN. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4716–4724 (2021)

    Google Scholar 

  18. Yan, C., Liu, Z., Li, F., Cao, C., Wang, Z., Wu, B.: Social relation analysis from videos via multi-entity reasoning. In: Proceedings of the 2021 International Conference on Multimedia Retrieval, pp. 358–366 (2021)

    Google Scholar 

  19. Zhang, M., Liu, X., Liu, W., Zhou, A., Ma, H., Mei, T.: Multi-granularity reasoning for social relation recognition from images. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 1618–1623. IEEE (2019)

    Google Scholar 

  20. Zhang, N., Paluri, M., Taigman, Y., Fergus, R., Bourdev, L.: Beyond frontal faces: improving person recognition using multiple cues. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4804–4813 (2015)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the NSFC-General Technology Basic Research Joint Funds under Grant (U1936220), the National Natural Science Foundation of China under Grant (61972047), the National Key Research and Development Program of China (2018YFC0831500).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hu, Y., Cao, C., Li, F., Yan, C., Qi, J., Wu, B. (2023). Overall-Distinctive GCN for Social Relation Recognition on Videos. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13833. Springer, Cham. https://doi.org/10.1007/978-3-031-27077-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-27077-2_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-27076-5

  • Online ISBN: 978-3-031-27077-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation