Log in

MCDGait: multimodal co-learning distillation network with spatial-temporal graph reasoning for gait recognition in the wild

  • Research
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Gait recognition in the wild has attracted the attention of the academic community. However, existing unimodal algorithms cannot achieve the same performance on in-the-wild datasets as in-the-lab datasets because unimodal data have many limitations in-the-wild environments. Therefore, we propose a multimodal approach combining silhouettes and skeletons and formulate the multimodal gait recognition problem as a multimodal co-learning problem. In particular, we propose a multimodal co-learning distillation network (MCDGait) that integrates two sub-networks processing unimodal data into a single fusion network. Based on the semantic consistency of different modalities and the paradigm of deep mutual learning, the performance of the entire network is continuously improved via the bidirectional knowledge distillation between the sub-networks and fusion network. Inspired by the observation that specific body parts or joints exhibit unique motion characteristics and have linkage with other parts or joints during walking, we propose a spatial-temporal graph reasoning module (ST-GRM). This module represents the parts or joints as graph nodes and the motion linkages between them as edges. By utilizing dynamic graph generator, the module implicitly captures the dynamic changes of the human body. Based on the generated graphs, the independent spatial-temporal linkage feature of each part and the interactive spatial-temporal linkage feature are aggregated simultaneously. Extensive experiments conducted on two in-the-wild datasets demonstrate the state-of-the-art performance of the proposed method. The average rank-1 accuracy on datasets Gait3D and GREW is 50.90% and 58.06%, respectively. The source code can be obtained from https://github.com/Boye**ong/MCDGait.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The datasets analysed during the current study are available in the https://www.grew-benchmark.org and https://gait3d.github.io.

References

  1. Gao, L., Hu, L., Lyu, F., Zhu, L., Wan, L., Pun, C.M., Feng, W.: Difference-guided multi-scale spatial-temporal representation for sign language recognition. Vis. Comput. 39(8), 3417–3428 (2023)

  2. Dong, Y., Yu, C., Ha, R., Shi, Y., Ma, Y., Xu, L., Fu, Y., Wang, J.: HybridGait: A benchmark for spatial-temporal cloth-changing gait recognition with hybrid explorations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38(2), pp. 1600–1608 (2024). https://doi.org/10.1609/aaai.v38i2.27926

  3. Teepe, T., Khan, A., Gilg, J., Herzog, F., Hörmann, S., Rigoll, G.: GaitGraph: graph convolutional network for skeleton-based gait recognition. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 2314–2318. IEEE (2021)

  4. Shiraga, K., Makihara, Y., Muramatsu, D., Echigo, T., Yagi, Y.: GeiNet: view-invariant gait recognition using a convolutional neural network. In: 2016 International Conference on Biometrics (ICB), pp. 1–8. IEEE (2016)

  5. Chao, H., He, Y., Zhang, J., Feng, J.: GaitSet: regarding gait as a set for cross-view gait recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8126–8133 (2019)

  6. Fan, C., Peng, Y., Cao, C., Liu, X., Hou, S., Chi, J., Huang, Y., Li, Q., He, Z.: GaitPart: temporal part-based model for gait recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14225–14233 (2020)

  7. Hou, S., Cao, C., Liu, X., Huang, Y.: Gait lateral network: learning discriminative and compact representations for gait recognition. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX, pp. 382–398. Springer (2020)

  8. Lin, B., Zhang, S., Yu, X.: Gait recognition via effective global-local feature representation and local temporal aggregation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14648–14656 (2021)

  9. Huang, X., Zhu, D., Wang, H., Wang, X., Yang, B., He, B., Liu, W., Feng, B.: Context-sensitive temporal feature learning for gait recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12909–12918 (2021)

  10. Huang, Z., Xue, D., Shen, X., Tian, X., Li, H., Huang, J., Hua, X.-S.: 3D local convolutional neural networks for gait recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14920–14929 (2021)

  11. Lin, B., Zhang, S., Bao, F.: Gait recognition with multiple-temporal-scale 3D convolutional neural network. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 3054–3062 (2020)

  12. Yu, S., Tan, D., Tan, T.: A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. In: 18th International Conference on Pattern Recognition (ICPR’06), vol. 4, pp. 441–444. IEEE (2006)

  13. Takemura, N., Makihara, Y., Muramatsu, D., Echigo, T., Yagi, Y.: Multi-view large population gait dataset and its performance evaluation for cross-view gait recognition. IPSJ Trans. Comput. Vis. Appl. 10, 1–14 (2018)

    Google Scholar 

  14. Zheng, J., Liu, X., Liu, W., He, L., Yan, C., Mei, T.: Gait recognition in the wild with dense 3D representations and a benchmark. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20228–20237 (2022)

  15. Zhu, Z., Guo, X., Yang, T., Huang, J., Deng, J., Huang, G., Du, D., Lu, J., Zhou, J.: Gait recognition in the wild: a benchmark. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14789–14799 (2021)

  16. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. ar**v preprint ar**v:1503.02531 (2015)

  17. Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. ar**v preprint ar**v:1910.10699 (2019)

  18. **e, Z., Zhang, W., Sheng, B., Li, P., Chen, C.P.: BaGFN: broad attentive graph fusion network for high-order feature interactions. IEEE Trans. Neural Netw. Learn. Syst. 34, 4499–4513 (2021)

    Article  Google Scholar 

  19. Wang, W., Lu, X., Shen, J., Crandall, D.J., Shao, L.: Zero-shot video object segmentation via attentive graph neural networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9236–9245 (2019)

  20. Fan, L., Wang, W., Huang, S., Tang, X., Zhu, S.-C.: Understanding human gaze communication by spatio-temporal graph reasoning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5724–5733 (2019)

  21. Qi, S., Wang, W., Jia, B., Shen, J., Zhu, S.-C.: Learning human-object interactions by graph parsing neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 401–417 (2018)

  22. Wang, W., Zhu, H., Dai, J., Pang, Y., Shen, J., Shao, L.: Hierarchical human parsing with typed part-relation reasoning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8929–8939 (2020)

  23. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. ar**v preprint ar**v:1609.02907 (2016)

  24. Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5533–5541 (2017)

  25. Castro, F.M., Marin-Jimenez, M.J., Guil, N., Blanca, N.: Multimodal feature fusion for CNN-based gait recognition: an empirical comparison. Neural Comput. Appl. 32, 14173–14193 (2020)

    Article  Google Scholar 

  26. Li, G., Guo, L., Zhang, R., Qian, J., Gao, S.: TransGait: multimodal-based gait recognition with set transformer. Appl. Intell. 53(2), 1535–1547 (2023)

    Article  Google Scholar 

  27. Papavasileiou, I., Qiao, Z., Zhang, C., Zhang, W., Bi, J., Han, S.: GaitCode: gait-based continuous authentication using multimodal learning and wearable sensors. Smart Health 19, 100162 (2021)

    Article  Google Scholar 

  28. Kumar, P., Mukherjee, S., Saini, R., Kaushik, P., Roy, P.P., Dogra, D.P.: Multimodal gait recognition with inertial sensor data and video using evolutionary algorithm. IEEE Trans. Fuzzy Syst. 27(5), 956–965 (2018)

    Article  Google Scholar 

  29. Marín-Jiménez, M.J., Castro, F.M., Delgado-Escaño, R., Kalogeiton, V., Guil, N.: UGaitNet: multimodal gait recognition with missing input modalities. IEEE Trans. Inf. Forensics Secur. 16, 5452–5462 (2021)

    Article  Google Scholar 

  30. Rahate, A., Walambe, R., Ramanna, S., Kotecha, K.: Multimodal co-learning: challenges, applications with datasets, recent advances and future directions. Inf Fusion 81, 203–239 (2022)

    Article  Google Scholar 

  31. Seo, S., Na, S., Kim, J.: HMTL: heterogeneous modality transfer learning for audio-visual sentiment analysis. IEEE Access 8, 140426–140437 (2020)

    Article  Google Scholar 

  32. Hou, J.-C., Wang, S.-S., Lai, Y.-H., Tsao, Y., Chang, H.-W., Wang, H.-M.: Audio-visual speech enhancement using multimodal deep convolutional neural networks. IEEE Trans. Emerging Top. Comput. Intell. 2(2), 117–128 (2018)

    Article  Google Scholar 

  33. Liu, R., Wang, T., Li, H., Zhang, P., Li, J., Yang, X., Shen, D., Sheng, B.: TMM-Nets: transferred multi-to mono-modal generation for lupus retinopathy diagnosis. IEEE Trans. Med. Imaging 42(4), 1083–1094 (2022)

    Article  Google Scholar 

  34. Mao, Y., Zhou, W., Lu, Z., Deng, J., Li, H.: CMD: self-supervised 3d action representation learning with cross-modal mutual distillation. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part III, pp. 734–752. Springer (2022)

  35. Pei, Y., Huang, T., Ipenburg, W., Pechenizkiy, M.: ResGCN: attention-based deep residual modeling for anomaly detection on attributed networks. In: 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–2 (2021). IEEE

  36. Han, K., Wang, Y., Guo, J., Tang, Y., Wu, E.: Vision GNN: an image is worth graph of nodes. ar**v preprint ar**v:2206.00272 (2022)

  37. Yan, S., **ong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

  38. Li, G., Muller, M., Thabet, A., Ghanem, B.: DeepGCNs: Can GCNs go as deep as CNNs? In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9267–9276 (2019)

  39. Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. ar**v preprint ar**v:1703.07737 (2017)

  40. Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3967–3976 (2019)

  41. Peng, B., **, X., Liu, J., Li, D., Wu, Y., Liu, Y., Zhou, S., Zhang, Z.: Correlation congruence for knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5007–5016 (2019)

  42. Zhu, J., Tang, S., Chen, D., Yu, S., Liu, Y., Rong, M., Yang, A., Wang, X.: Complementary relation contrastive distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9260–9269 (2021)

  43. Jang, E., Gu, S., Poole, B.: Categorical reparameterization with Gumbel-Softmax. ar**v preprint ar**v:1611.01144 (2016)

  44. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V 13, pp. 740–755. Springer (2014)

  45. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. ar**v preprint ar**v:1412.6980 (2014)

  46. Fan, C., Liang, J., Shen, C., Hou, S., Huang, Y., Yu, S.: OpenGait: revisiting gait recognition toward better practicality. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)

  47. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L. et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

  48. Zheng, J., Liu, X., Gu, X., Sun, Y., Gan, C., Zhang, J., Liu, W., Yan, C.: Gait recognition in the wild with multi-hop temporal switch. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 6136–6145 (2022)

  49. Ma, K., Fu, Y., Zheng, D., Cao, C., Hu, X., Huang, Y.: Dynamic aggregated network for gait recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22076–22085 (2023)

  50. Zhu, H., Zheng, W., Zheng, Z., Nevatia, R.: GaitRef: gait recognition with refined sequential skeletons. In: 2023 IEEE International Joint Conference on Biometrics (IJCB), pp. 1–10 (2023). IEEE

  51. Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization done by Jianbo **ong; methodology done by Jianbo **ong; formal analysis and investigation done by Jianbo **ong; writing—original draft preparation done by Jianbo **ong; writing—review and editing done by Shinan Zou and Jianbo **ong; funding acquisition done by ** Tang; resources acquired by ** Tang; supervision done by ** Tang. Grammar correction and improving the readability of the paper done by Tardi Tjahjadi.

Corresponding author

Correspondence to ** Tang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

We strictly adhere to the application protocols for public datasets (Gait3D and GREW). The data are used for academic research only and are not copied or sold. In addition, we adhere to both public datasets’ ethics and privacy statements.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: there was a typo in the biography of Jianbo **ong.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

**ong, J., Zou, S., Tang, J. et al. MCDGait: multimodal co-learning distillation network with spatial-temporal graph reasoning for gait recognition in the wild. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03426-y

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00371-024-03426-y

Keywords

Navigation