Logit prototype learning with active multimodal representation for robust open-set recognition

Fu, Yimin; Liu, Zhunga; Wang, Zicheng

doi:10.1007/s11432-023-3924-x

Logit prototype learning with active multimodal representation for robust open-set recognition

Research Paper
Published: 17 May 2024

Volume 67, article number 162204, (2024)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Yimin Fu¹,
Zhunga Liu¹ &
Zicheng Wang¹

35 Accesses
Explore all metrics

Abstract

Robust open-set recognition (OSR) performance has become a prerequisite for pattern recognition systems in real-world applications. However, the existing OSR methods are primarily implemented on the basis of single-modal perception, and their performance is limited when single-modal data fail to provide sufficient descriptions of the objects. Although multimodal data can provide more comprehensive information than single-modal data, the learning of decision boundaries can be affected by the feature representation gap between different modalities. To effectively integrate multimodal data for robust OSR performance, we propose logit prototype learning (LPL) with active multimodal representation. In LPL, the input multimodal data are transformed into the logit space, enabling a direct exploration of intermodal correlations without the impact of scale inconsistency. Then, the fusion weights of each modality are determined using an entropybased uncertainty estimation method. This approach realizes adaptive adjustment of the fusion strategy to provide comprehensive descriptions in the presence of external disturbances. Moreover, the single-modal and multimodal representations are jointly optimized interactively to learn discriminative decision boundaries. Finally, a stepwise recognition rule is employed to reduce the misclassification risk and facilitate the distinction between known and unknown classes. Extensive experiments on three multimodal datasets have been done to demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

References

Zhou Z H. Open-environment machine learning. Natl Sci Rev, 2022, 9: nwac123
Article Google Scholar
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations, 2015
Google Scholar
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770–778
Google Scholar
Pan X Y, Fan Y-X, Jia J, et al. Identifying RNA-binding proteins using multi-label deep learning. Sci China Inf Sci, 2019, 62: 019103
Article Google Scholar
Scheirer W J, Rocha A D R, Sapkota A, et al. Toward open set recognition. IEEE Trans Pattern Anal Mach Intell, 2015, 35: 1757–1772
Article Google Scholar
Geng C, Huang S J, Chen S. Recent advances in open set recognition: a survey. IEEE Trans Pattern Anal Mach Intell, 2020, 43: 3614–3631
Article Google Scholar
Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proc IEEE, 1998, 86: 2278–2324
Article Google Scholar
Netzer Y, Wang T, Coates A, et al. Reading digits in natural images with unsupervised feature learning. In: Proceedings of the Conference on Neural Information Processing Systems Workshops, 2011
Google Scholar
Krizhevsky A, Hinton G. Learning Multiple Layers of Features From Tiny Images. Technical Report. Toronto: University of Toronto, 2009
Google Scholar
Baltrušaitis T, Ahuja C, Morency L P. Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell, 2018, 41: 423–443
Article Google Scholar
Gong C, Tao D, Maybank S J, et al. Multi-modal curriculum learning for semi-supervised image classification. IEEE Trans Image Process, 2016, 25: 3249–3260
Article MathSciNet Google Scholar
Zhang W C, Sun F C, Wu H, et al. A framework for the fusion of visual and tactile modalities for improving robot perception. Sci China Inf Sci, 2017, 60: 012201
Article Google Scholar
Sun X, Tian Y, Lu W X, et al. From single- to multi-modal remote sensing imagery interpretation: a survey and taxonomy. Sci China Inf Sci, 2023, 66: 140301
Article Google Scholar
Mangai U G, Samanta S, Das S, et al. A survey of decision fusion and feature fusion strategies for pattern classification. IETE Tech Rev, 2010, 27: 293–307
Article Google Scholar
Huang L Q, Liu Z G, Pan Q, et al. Evidential combination of augmented multi-source of information based on domain adaptation. Sci China Inf Sci, 2020, 63: 210203
Article MathSciNet Google Scholar
Liu Z G, Ning L B, Zhang Z W. A new progressive multisource domain adaptation network with weighted decision fusion. IEEE Trans Neural Netw Learn Syst, 2024, 35: 1062–1072
Article Google Scholar
Fu Y, Liu Z, Yang Y, et al. Adaptive open set recognition with multi-modal joint metric learning. In: Proceedings of the 5th Chinese Conference on Pattern Recognition and Computer Vision, 2022. 631–644
Google Scholar
Scheirer W J, Jain L P, Boult T E. Probability models for open set recognition. IEEE Trans Pattern Anal Mach Intell, 2014, 36: 2317–2324
Article Google Scholar
Bendale A, Boult T. Towards open world recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 1893–1902
Google Scholar
Júnior P R M, de Souza R M, Werneck R O, et al. Nearest neighbors distance ratio open-set classifier. Mach Learn, 2017, 106: 359–386
Article MathSciNet Google Scholar
Bendale A, Boult T E. Towards open set deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 1563–1572
Google Scholar
Shu L, Xu H, Liu B. DOC: deep open classification of text documents. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2017. 2911–2916
Google Scholar
Yoshihashi R, Shao W, Kawakami R, et al. Classification-reconstruction learning for open-set recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 4016–4025
Google Scholar
Jang J, Kim C O. Collective decision of one-vs-rest networks for open-set recognition. IEEE Trans Neural Netw Learn Syst, 2024, 35: 2327–2338
Article Google Scholar
Vaze S, Han K, Vedaldi A, et al. Open-set recognition: a good closed-set classifier is all you need. In: Proceedings of the International Conference on Learning Representations, 2022
Google Scholar
Gui J, Sun Z, Wen Y, et al. A review on generative adversarial networks: algorithms, theory, and applications. IEEE Trans Knowl Data Eng, 2021, 35: 3313–3332
Article Google Scholar
Ge Z, Demyanov S, Chen Z, et al. Generative OpenMax for multi-class open set classification. In: Proceedings of British Machine Vision Conference, 2017
Google Scholar
Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks. Commun ACM, 2020, 63: 139–144
Article Google Scholar
Neal L, Olson M, Fern X, et al. Open set learning with counterfactual images. In: Proceedings of the European Conference on Computer Vision, 2018. 613–628
Google Scholar
Kong S, Ramanan D. OpenGAN: open-set recognition via open data generation. IEEE Trans Pattern Anal Mach Intell, 2024. doi: https://doi.org/10.1109/TPAMI.2022.3184052
Google Scholar
Kuncheva L I, Bezdek J C. Nearest prototype classification: clustering, genetic algorithms, or random search? IEEE Trans Syst Man Cybern C, 1998, 28: 160–164
Article Google Scholar
Wei X-S, Xu S-L, Chen H, et al. Prototype-based classifier learning for long-tailed visual recognition. Sci China Inf Sci, 2022, 65: 160105
Article Google Scholar
Yang H M, Zhang X Y, Yin F, et al. Convolutional prototype network for open set recognition. IEEE Trans Pattern Anal Mach Intell, 2020, 44: 2358–2370.
Google Scholar
Chen G, Qiao L, Shi Y, et al. Learning open set network with discriminative reciprocal points. In: Proceedings of the European Conference on Computer Vision, 2020. 507–522
Google Scholar
Chen G, Peng P, Wang X, et al. Adversarial reciprocal points learning for open set recognition. IEEE Trans Pattern Anal Mach Intell, 2021, 44: 8065–8081
Google Scholar
Miller D, Sunderhauf N, Milford M, et al. Class anchor clustering: a loss for distance-based open set recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021. 3570–3578
Google Scholar
Gu Y F, Liu T Z, Gao G M, et al. Multimodal hyperspectral remote sensing: an overview and perspective. Sci China Inf Sci, 2021, 64: 121301
Article Google Scholar
Feng D, Haase-Schutz C, Rosenbaum L, et al. Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and challenges. IEEE Trans Intell Transp Syst, 2020, 22: 1341–1360
Article Google Scholar
Song Q, Sun B, Li S. Multimodal sparse transformer network for audio-visual speech recognition. IEEE Trans Neural Netw Learn Syst, 2023, 34: 10028–10038
Article Google Scholar
Ding C, Tao D. Robust face recognition via multimodal deep face representation. IEEE Trans Multimedia, 2015, 17: 2049–2058
Article Google Scholar
Strese M, Schuwerk C, Iepure A, et al. Multimodal feature-based surface material classification. IEEE Trans Haptics, 2016, 10: 226–239
Article Google Scholar
Wang X, Kumar D, Thome N, et al. Recipe recognition with large multimodal food dataset. In: Proceedings of IEEE International Conference on Multimedia & Expo Workshops, 2015. 1–6
Google Scholar
Zheng H, Fang L, Ji M, et al. Deep learning for surface material classification using haptic and visual information. IEEE Trans Multimedia, 2016, 18: 2407–2416
Article Google Scholar
Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019. 4171–4186
Google Scholar
Wen Y, Zhang K, Li Z, et al. A discriminative feature learning approach for deep face recognition. In: Proceedings of the European Conference on Computer Vision, 2016. 499–515
Google Scholar
Schroff F, Kalenichenko D, Philbin J. FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 815–823
Google Scholar
Yu B, Tao D. Deep metric learning with tuplet margin loss. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. 6490–6499
Google Scholar

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (Grant No. U20B2067), Innovation Foundation for Doctor Dissertation of Northwestern Polytechnical University (Grant No. CX2023015), and Cultivation Foundation for Excellent Doctoral Dissertation of the School of Automation of Northwestern Polytechnical University.

Author information

Authors and Affiliations

School of Automation, Northwestern Polytechnical University, **’an, 710072, China
Yimin Fu, Zhunga Liu & Zicheng Wang

Authors

Yimin Fu
View author publications
You can also search for this author in PubMed Google Scholar
Zhunga Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zicheng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhunga Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fu, Y., Liu, Z. & Wang, Z. Logit prototype learning with active multimodal representation for robust open-set recognition. Sci. China Inf. Sci. 67, 162204 (2024). https://doi.org/10.1007/s11432-023-3924-x

Download citation

Received: 31 May 2023
Revised: 30 August 2023
Accepted: 18 November 2023
Published: 17 May 2024
DOI: https://doi.org/10.1007/s11432-023-3924-x

Keywords

Access this article

Log in via an institution

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Logit prototype learning with active multimodal representation for robust open-set recognition

Abstract

Access this article

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation