MCDGait: multimodal co-learning distillation network with spatial-temporal graph reasoning for gait recognition in the wild

**ong, Jianbo; Zou, Shinan; Tang, **; Tjahjadi, Tardi

doi:10.1007/s00371-024-03426-y

MCDGait: multimodal co-learning distillation network with spatial-temporal graph reasoning for gait recognition in the wild

Research
Published: 15 June 2024

(2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Jianbo **ong¹,
Shinan Zou¹,
** Tang¹ &
…
Tardi Tjahjadi²

61 Accesses
Explore all metrics

Abstract

Gait recognition in the wild has attracted the attention of the academic community. However, existing unimodal algorithms cannot achieve the same performance on in-the-wild datasets as in-the-lab datasets because unimodal data have many limitations in-the-wild environments. Therefore, we propose a multimodal approach combining silhouettes and skeletons and formulate the multimodal gait recognition problem as a multimodal co-learning problem. In particular, we propose a multimodal co-learning distillation network (MCDGait) that integrates two sub-networks processing unimodal data into a single fusion network. Based on the semantic consistency of different modalities and the paradigm of deep mutual learning, the performance of the entire network is continuously improved via the bidirectional knowledge distillation between the sub-networks and fusion network. Inspired by the observation that specific body parts or joints exhibit unique motion characteristics and have linkage with other parts or joints during walking, we propose a spatial-temporal graph reasoning module (ST-GRM). This module represents the parts or joints as graph nodes and the motion linkages between them as edges. By utilizing dynamic graph generator, the module implicitly captures the dynamic changes of the human body. Based on the generated graphs, the independent spatial-temporal linkage feature of each part and the interactive spatial-temporal linkage feature are aggregated simultaneously. Extensive experiments conducted on two in-the-wild datasets demonstrate the state-of-the-art performance of the proposed method. The average rank-1 accuracy on datasets Gait3D and GREW is 50.90% and 58.06%, respectively. The source code can be obtained from https://github.com/Boye**ong/MCDGait.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning rich features for gait recognition by integrating skeletons and silhouettes

Article 07 June 2023

RepGCN: A Novel Graph Convolution-Based Model for Gait Recognition with Accompanying Behaviors

Asymmetric information-regularized learning for skeleton-based action recognition

Article 02 December 2023

Data availability

The datasets analysed during the current study are available in the https://www.grew-benchmark.org and https://gait3d.github.io.

References

Gao, L., Hu, L., Lyu, F., Zhu, L., Wan, L., Pun, C.M., Feng, W.: Difference-guided multi-scale spatial-temporal representation for sign language recognition. Vis. Comput. 39(8), 3417–3428 (2023)
Dong, Y., Yu, C., Ha, R., Shi, Y., Ma, Y., Xu, L., Fu, Y., Wang, J.: HybridGait: A benchmark for spatial-temporal cloth-changing gait recognition with hybrid explorations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38(2), pp. 1600–1608 (2024). https://doi.org/10.1609/aaai.v38i2.27926
Teepe, T., Khan, A., Gilg, J., Herzog, F., Hörmann, S., Rigoll, G.: GaitGraph: graph convolutional network for skeleton-based gait recognition. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 2314–2318. IEEE (2021)
Shiraga, K., Makihara, Y., Muramatsu, D., Echigo, T., Yagi, Y.: GeiNet: view-invariant gait recognition using a convolutional neural network. In: 2016 International Conference on Biometrics (ICB), pp. 1–8. IEEE (2016)
Chao, H., He, Y., Zhang, J., Feng, J.: GaitSet: regarding gait as a set for cross-view gait recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8126–8133 (2019)
Fan, C., Peng, Y., Cao, C., Liu, X., Hou, S., Chi, J., Huang, Y., Li, Q., He, Z.: GaitPart: temporal part-based model for gait recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14225–14233 (2020)
Hou, S., Cao, C., Liu, X., Huang, Y.: Gait lateral network: learning discriminative and compact representations for gait recognition. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX, pp. 382–398. Springer (2020)
Lin, B., Zhang, S., Yu, X.: Gait recognition via effective global-local feature representation and local temporal aggregation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14648–14656 (2021)
Huang, X., Zhu, D., Wang, H., Wang, X., Yang, B., He, B., Liu, W., Feng, B.: Context-sensitive temporal feature learning for gait recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12909–12918 (2021)
Huang, Z., Xue, D., Shen, X., Tian, X., Li, H., Huang, J., Hua, X.-S.: 3D local convolutional neural networks for gait recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14920–14929 (2021)
Lin, B., Zhang, S., Bao, F.: Gait recognition with multiple-temporal-scale 3D convolutional neural network. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 3054–3062 (2020)
Yu, S., Tan, D., Tan, T.: A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. In: 18th International Conference on Pattern Recognition (ICPR’06), vol. 4, pp. 441–444. IEEE (2006)
Takemura, N., Makihara, Y., Muramatsu, D., Echigo, T., Yagi, Y.: Multi-view large population gait dataset and its performance evaluation for cross-view gait recognition. IPSJ Trans. Comput. Vis. Appl. 10, 1–14 (2018)
Google Scholar
Zheng, J., Liu, X., Liu, W., He, L., Yan, C., Mei, T.: Gait recognition in the wild with dense 3D representations and a benchmark. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20228–20237 (2022)
Zhu, Z., Guo, X., Yang, T., Huang, J., Deng, J., Huang, G., Du, D., Lu, J., Zhou, J.: Gait recognition in the wild: a benchmark. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14789–14799 (2021)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. ar**v preprint ar**v:1503.02531 (2015)
Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. ar**v preprint ar**v:1910.10699 (2019)
**e, Z., Zhang, W., Sheng, B., Li, P., Chen, C.P.: BaGFN: broad attentive graph fusion network for high-order feature interactions. IEEE Trans. Neural Netw. Learn. Syst. 34, 4499–4513 (2021)
Article Google Scholar
Wang, W., Lu, X., Shen, J., Crandall, D.J., Shao, L.: Zero-shot video object segmentation via attentive graph neural networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9236–9245 (2019)
Fan, L., Wang, W., Huang, S., Tang, X., Zhu, S.-C.: Understanding human gaze communication by spatio-temporal graph reasoning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5724–5733 (2019)
Qi, S., Wang, W., Jia, B., Shen, J., Zhu, S.-C.: Learning human-object interactions by graph parsing neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 401–417 (2018)
Wang, W., Zhu, H., Dai, J., Pang, Y., Shen, J., Shao, L.: Hierarchical human parsing with typed part-relation reasoning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8929–8939 (2020)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. ar**v preprint ar**v:1609.02907 (2016)
Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5533–5541 (2017)
Castro, F.M., Marin-Jimenez, M.J., Guil, N., Blanca, N.: Multimodal feature fusion for CNN-based gait recognition: an empirical comparison. Neural Comput. Appl. 32, 14173–14193 (2020)
Article Google Scholar
Li, G., Guo, L., Zhang, R., Qian, J., Gao, S.: TransGait: multimodal-based gait recognition with set transformer. Appl. Intell. 53(2), 1535–1547 (2023)
Article Google Scholar
Papavasileiou, I., Qiao, Z., Zhang, C., Zhang, W., Bi, J., Han, S.: GaitCode: gait-based continuous authentication using multimodal learning and wearable sensors. Smart Health 19, 100162 (2021)
Article Google Scholar
Kumar, P., Mukherjee, S., Saini, R., Kaushik, P., Roy, P.P., Dogra, D.P.: Multimodal gait recognition with inertial sensor data and video using evolutionary algorithm. IEEE Trans. Fuzzy Syst. 27(5), 956–965 (2018)
Article Google Scholar
Marín-Jiménez, M.J., Castro, F.M., Delgado-Escaño, R., Kalogeiton, V., Guil, N.: UGaitNet: multimodal gait recognition with missing input modalities. IEEE Trans. Inf. Forensics Secur. 16, 5452–5462 (2021)
Article Google Scholar
Rahate, A., Walambe, R., Ramanna, S., Kotecha, K.: Multimodal co-learning: challenges, applications with datasets, recent advances and future directions. Inf Fusion 81, 203–239 (2022)
Article Google Scholar
Seo, S., Na, S., Kim, J.: HMTL: heterogeneous modality transfer learning for audio-visual sentiment analysis. IEEE Access 8, 140426–140437 (2020)
Article Google Scholar
Hou, J.-C., Wang, S.-S., Lai, Y.-H., Tsao, Y., Chang, H.-W., Wang, H.-M.: Audio-visual speech enhancement using multimodal deep convolutional neural networks. IEEE Trans. Emerging Top. Comput. Intell. 2(2), 117–128 (2018)
Article Google Scholar
Liu, R., Wang, T., Li, H., Zhang, P., Li, J., Yang, X., Shen, D., Sheng, B.: TMM-Nets: transferred multi-to mono-modal generation for lupus retinopathy diagnosis. IEEE Trans. Med. Imaging 42(4), 1083–1094 (2022)
Article Google Scholar
Mao, Y., Zhou, W., Lu, Z., Deng, J., Li, H.: CMD: self-supervised 3d action representation learning with cross-modal mutual distillation. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part III, pp. 734–752. Springer (2022)
Pei, Y., Huang, T., Ipenburg, W., Pechenizkiy, M.: ResGCN: attention-based deep residual modeling for anomaly detection on attributed networks. In: 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–2 (2021). IEEE
Han, K., Wang, Y., Guo, J., Tang, Y., Wu, E.: Vision GNN: an image is worth graph of nodes. ar**v preprint ar**v:2206.00272 (2022)
Yan, S., **ong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Li, G., Muller, M., Thabet, A., Ghanem, B.: DeepGCNs: Can GCNs go as deep as CNNs? In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9267–9276 (2019)
Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. ar**v preprint ar**v:1703.07737 (2017)
Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3967–3976 (2019)
Peng, B., **, X., Liu, J., Li, D., Wu, Y., Liu, Y., Zhou, S., Zhang, Z.: Correlation congruence for knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5007–5016 (2019)
Zhu, J., Tang, S., Chen, D., Yu, S., Liu, Y., Rong, M., Yang, A., Wang, X.: Complementary relation contrastive distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9260–9269 (2021)
Jang, E., Gu, S., Poole, B.: Categorical reparameterization with Gumbel-Softmax. ar**v preprint ar**v:1611.01144 (2016)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V 13, pp. 740–755. Springer (2014)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. ar**v preprint ar**v:1412.6980 (2014)
Fan, C., Liang, J., Shen, C., Hou, S., Huang, Y., Yu, S.: OpenGait: revisiting gait recognition toward better practicality. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L. et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Zheng, J., Liu, X., Gu, X., Sun, Y., Gan, C., Zhang, J., Liu, W., Yan, C.: Gait recognition in the wild with multi-hop temporal switch. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 6136–6145 (2022)
Ma, K., Fu, Y., Zheng, D., Cao, C., Hu, X., Huang, Y.: Dynamic aggregated network for gait recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22076–22085 (2023)
Zhu, H., Zheng, W., Zheng, Z., Nevatia, R.: GaitRef: gait recognition with refined sequential skeletons. In: 2023 IEEE International Joint Conference on Biometrics (IJCB), pp. 1–10 (2023). IEEE
Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Automation, Central South University, Changsha, China
Jianbo **ong, Shinan Zou & ** Tang
School of Engineering, University of Warwick, Coventry, UK
Tardi Tjahjadi

Authors

Jianbo **ong
View author publications
You can also search for this author in PubMed Google Scholar
Shinan Zou
View author publications
You can also search for this author in PubMed Google Scholar
** Tang
View author publications
You can also search for this author in PubMed Google Scholar
Tardi Tjahjadi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization done by Jianbo **ong; methodology done by Jianbo **ong; formal analysis and investigation done by Jianbo **ong; writing—original draft preparation done by Jianbo **ong; writing—review and editing done by Shinan Zou and Jianbo **ong; funding acquisition done by ** Tang; resources acquired by ** Tang; supervision done by ** Tang. Grammar correction and improving the readability of the paper done by Tardi Tjahjadi.

Corresponding author

Correspondence to ** Tang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

We strictly adhere to the application protocols for public datasets (Gait3D and GREW). The data are used for academic research only and are not copied or sold. In addition, we adhere to both public datasets’ ethics and privacy statements.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: there was a typo in the biography of Jianbo **ong.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

**ong, J., Zou, S., Tang, J. et al. MCDGait: multimodal co-learning distillation network with spatial-temporal graph reasoning for gait recognition in the wild. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03426-y

Download citation

Accepted: 21 April 2024
Published: 15 June 2024
DOI: https://doi.org/10.1007/s00371-024-03426-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MCDGait: multimodal co-learning distillation network with spatial-temporal graph reasoning for gait recognition in the wild

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning rich features for gait recognition by integrating skeletons and silhouettes

RepGCN: A Novel Graph Convolution-Based Model for Gait Recognition with Accompanying Behaviors

Asymmetric information-regularized learning for skeleton-based action recognition

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

MCDGait: multimodal co-learning distillation network with spatial-temporal graph reasoning for gait recognition in the wild

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning rich features for gait recognition by integrating skeletons and silhouettes

RepGCN: A Novel Graph Convolution-Based Model for Gait Recognition with Accompanying Behaviors

Asymmetric information-regularized learning for skeleton-based action recognition

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation