Log in

Logit prototype learning with active multimodal representation for robust open-set recognition

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Robust open-set recognition (OSR) performance has become a prerequisite for pattern recognition systems in real-world applications. However, the existing OSR methods are primarily implemented on the basis of single-modal perception, and their performance is limited when single-modal data fail to provide sufficient descriptions of the objects. Although multimodal data can provide more comprehensive information than single-modal data, the learning of decision boundaries can be affected by the feature representation gap between different modalities. To effectively integrate multimodal data for robust OSR performance, we propose logit prototype learning (LPL) with active multimodal representation. In LPL, the input multimodal data are transformed into the logit space, enabling a direct exploration of intermodal correlations without the impact of scale inconsistency. Then, the fusion weights of each modality are determined using an entropybased uncertainty estimation method. This approach realizes adaptive adjustment of the fusion strategy to provide comprehensive descriptions in the presence of external disturbances. Moreover, the single-modal and multimodal representations are jointly optimized interactively to learn discriminative decision boundaries. Finally, a stepwise recognition rule is employed to reduce the misclassification risk and facilitate the distinction between known and unknown classes. Extensive experiments on three multimodal datasets have been done to demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Germany)

Instant access to the full article PDF.

References

  1. Zhou Z H. Open-environment machine learning. Natl Sci Rev, 2022, 9: nwac123

    Article  Google Scholar 

  2. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations, 2015

    Google Scholar 

  3. He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770–778

    Google Scholar 

  4. Pan X Y, Fan Y-X, Jia J, et al. Identifying RNA-binding proteins using multi-label deep learning. Sci China Inf Sci, 2019, 62: 019103

    Article  Google Scholar 

  5. Scheirer W J, Rocha A D R, Sapkota A, et al. Toward open set recognition. IEEE Trans Pattern Anal Mach Intell, 2015, 35: 1757–1772

    Article  Google Scholar 

  6. Geng C, Huang S J, Chen S. Recent advances in open set recognition: a survey. IEEE Trans Pattern Anal Mach Intell, 2020, 43: 3614–3631

    Article  Google Scholar 

  7. Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proc IEEE, 1998, 86: 2278–2324

    Article  Google Scholar 

  8. Netzer Y, Wang T, Coates A, et al. Reading digits in natural images with unsupervised feature learning. In: Proceedings of the Conference on Neural Information Processing Systems Workshops, 2011

    Google Scholar 

  9. Krizhevsky A, Hinton G. Learning Multiple Layers of Features From Tiny Images. Technical Report. Toronto: University of Toronto, 2009

    Google Scholar 

  10. Baltrušaitis T, Ahuja C, Morency L P. Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell, 2018, 41: 423–443

    Article  Google Scholar 

  11. Gong C, Tao D, Maybank S J, et al. Multi-modal curriculum learning for semi-supervised image classification. IEEE Trans Image Process, 2016, 25: 3249–3260

    Article  MathSciNet  Google Scholar 

  12. Zhang W C, Sun F C, Wu H, et al. A framework for the fusion of visual and tactile modalities for improving robot perception. Sci China Inf Sci, 2017, 60: 012201

    Article  Google Scholar 

  13. Sun X, Tian Y, Lu W X, et al. From single- to multi-modal remote sensing imagery interpretation: a survey and taxonomy. Sci China Inf Sci, 2023, 66: 140301

    Article  Google Scholar 

  14. Mangai U G, Samanta S, Das S, et al. A survey of decision fusion and feature fusion strategies for pattern classification. IETE Tech Rev, 2010, 27: 293–307

    Article  Google Scholar 

  15. Huang L Q, Liu Z G, Pan Q, et al. Evidential combination of augmented multi-source of information based on domain adaptation. Sci China Inf Sci, 2020, 63: 210203

    Article  MathSciNet  Google Scholar 

  16. Liu Z G, Ning L B, Zhang Z W. A new progressive multisource domain adaptation network with weighted decision fusion. IEEE Trans Neural Netw Learn Syst, 2024, 35: 1062–1072

    Article  Google Scholar 

  17. Fu Y, Liu Z, Yang Y, et al. Adaptive open set recognition with multi-modal joint metric learning. In: Proceedings of the 5th Chinese Conference on Pattern Recognition and Computer Vision, 2022. 631–644

    Google Scholar 

  18. Scheirer W J, Jain L P, Boult T E. Probability models for open set recognition. IEEE Trans Pattern Anal Mach Intell, 2014, 36: 2317–2324

    Article  Google Scholar 

  19. Bendale A, Boult T. Towards open world recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 1893–1902

    Google Scholar 

  20. Júnior P R M, de Souza R M, Werneck R O, et al. Nearest neighbors distance ratio open-set classifier. Mach Learn, 2017, 106: 359–386

    Article  MathSciNet  Google Scholar 

  21. Bendale A, Boult T E. Towards open set deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 1563–1572

    Google Scholar 

  22. Shu L, Xu H, Liu B. DOC: deep open classification of text documents. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2017. 2911–2916

    Google Scholar 

  23. Yoshihashi R, Shao W, Kawakami R, et al. Classification-reconstruction learning for open-set recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 4016–4025

    Google Scholar 

  24. Jang J, Kim C O. Collective decision of one-vs-rest networks for open-set recognition. IEEE Trans Neural Netw Learn Syst, 2024, 35: 2327–2338

    Article  Google Scholar 

  25. Vaze S, Han K, Vedaldi A, et al. Open-set recognition: a good closed-set classifier is all you need. In: Proceedings of the International Conference on Learning Representations, 2022

    Google Scholar 

  26. Gui J, Sun Z, Wen Y, et al. A review on generative adversarial networks: algorithms, theory, and applications. IEEE Trans Knowl Data Eng, 2021, 35: 3313–3332

    Article  Google Scholar 

  27. Ge Z, Demyanov S, Chen Z, et al. Generative OpenMax for multi-class open set classification. In: Proceedings of British Machine Vision Conference, 2017

    Google Scholar 

  28. Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks. Commun ACM, 2020, 63: 139–144

    Article  Google Scholar 

  29. Neal L, Olson M, Fern X, et al. Open set learning with counterfactual images. In: Proceedings of the European Conference on Computer Vision, 2018. 613–628

    Google Scholar 

  30. Kong S, Ramanan D. OpenGAN: open-set recognition via open data generation. IEEE Trans Pattern Anal Mach Intell, 2024. doi: https://doi.org/10.1109/TPAMI.2022.3184052

    Google Scholar 

  31. Kuncheva L I, Bezdek J C. Nearest prototype classification: clustering, genetic algorithms, or random search? IEEE Trans Syst Man Cybern C, 1998, 28: 160–164

    Article  Google Scholar 

  32. Wei X-S, Xu S-L, Chen H, et al. Prototype-based classifier learning for long-tailed visual recognition. Sci China Inf Sci, 2022, 65: 160105

    Article  Google Scholar 

  33. Yang H M, Zhang X Y, Yin F, et al. Convolutional prototype network for open set recognition. IEEE Trans Pattern Anal Mach Intell, 2020, 44: 2358–2370.

    Google Scholar 

  34. Chen G, Qiao L, Shi Y, et al. Learning open set network with discriminative reciprocal points. In: Proceedings of the European Conference on Computer Vision, 2020. 507–522

    Google Scholar 

  35. Chen G, Peng P, Wang X, et al. Adversarial reciprocal points learning for open set recognition. IEEE Trans Pattern Anal Mach Intell, 2021, 44: 8065–8081

    Google Scholar 

  36. Miller D, Sunderhauf N, Milford M, et al. Class anchor clustering: a loss for distance-based open set recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021. 3570–3578

    Google Scholar 

  37. Gu Y F, Liu T Z, Gao G M, et al. Multimodal hyperspectral remote sensing: an overview and perspective. Sci China Inf Sci, 2021, 64: 121301

    Article  Google Scholar 

  38. Feng D, Haase-Schutz C, Rosenbaum L, et al. Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and challenges. IEEE Trans Intell Transp Syst, 2020, 22: 1341–1360

    Article  Google Scholar 

  39. Song Q, Sun B, Li S. Multimodal sparse transformer network for audio-visual speech recognition. IEEE Trans Neural Netw Learn Syst, 2023, 34: 10028–10038

    Article  Google Scholar 

  40. Ding C, Tao D. Robust face recognition via multimodal deep face representation. IEEE Trans Multimedia, 2015, 17: 2049–2058

    Article  Google Scholar 

  41. Strese M, Schuwerk C, Iepure A, et al. Multimodal feature-based surface material classification. IEEE Trans Haptics, 2016, 10: 226–239

    Article  Google Scholar 

  42. Wang X, Kumar D, Thome N, et al. Recipe recognition with large multimodal food dataset. In: Proceedings of IEEE International Conference on Multimedia & Expo Workshops, 2015. 1–6

    Google Scholar 

  43. Zheng H, Fang L, Ji M, et al. Deep learning for surface material classification using haptic and visual information. IEEE Trans Multimedia, 2016, 18: 2407–2416

    Article  Google Scholar 

  44. Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019. 4171–4186

    Google Scholar 

  45. Wen Y, Zhang K, Li Z, et al. A discriminative feature learning approach for deep face recognition. In: Proceedings of the European Conference on Computer Vision, 2016. 499–515

    Google Scholar 

  46. Schroff F, Kalenichenko D, Philbin J. FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 815–823

    Google Scholar 

  47. Yu B, Tao D. Deep metric learning with tuplet margin loss. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. 6490–6499

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (Grant No. U20B2067), Innovation Foundation for Doctor Dissertation of Northwestern Polytechnical University (Grant No. CX2023015), and Cultivation Foundation for Excellent Doctoral Dissertation of the School of Automation of Northwestern Polytechnical University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhunga Liu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fu, Y., Liu, Z. & Wang, Z. Logit prototype learning with active multimodal representation for robust open-set recognition. Sci. China Inf. Sci. 67, 162204 (2024). https://doi.org/10.1007/s11432-023-3924-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-023-3924-x

Keywords

Navigation