Quality-Aware CLIP for Blind Image Quality Assessment

Pan, Wensheng; Yang, Zhifu; Liu, DingMing; Fang, Chenxin; Zhang, Yan; Dai, **yang

doi:10.1007/978-981-99-8537-1_32

Wensheng Pan^15,16,
Zhifu Yang¹⁶,
DingMing Liu¹⁶,
Chenxin Fang¹⁶,
Yan Zhang^15,17 &
…
**yang Dai^15,16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14430))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

610 Accesses

Abstract

Blind Image Quality Assessment (BIQA) aims to simulate human perception of image quality without reference images. Pretrained visual-linguistic models, like CLIP, have shown excellent performance in various visual tasks and have been successfully applied in BIQA. However, existing CLIP-based approaches typically employ a coarse classification method, dividing images into two or five quality levels based on CLIP’s text-image comparison ability. In this work, we propose a novel approach for BIQA that introduces a fine-grained quality-level stratification strategy. This strategy enables a more precise assessment of image quality across a wider range of levels. Additionally, we present a two-stage training model called Quality-Aware CLIP (QA-CLIP). In the first stage, we leverage a set of learnable text tokens to optimize the text description and fully utilize the representation capabilities of CLIP’s text encoder. In the second stage, we further optimize the image encoder and quality-aware block to capture features that are highly relevant to perceived quality. Experimental results demonstrate that QA-CLIP achieves comparable performance with state-of-the-art methods on various synthetic and real datasets. Notably, in CSIQ, TID2013, and KADID datasets, QA-CLIP outperforms the state-of-the-art by 1.2%, 4.7%, and 4.8% respectively in terms of Spearman Rank Correlation Coefficient (SRCC).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 55.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 69.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Gao, P., et al.: CLIP-adapter: better vision-language models with feature adapters. ar**v preprint ar**v:2110.04544 (2021)
Ghadiyaram, D., Bovik, A.C.: Massive online crowdsourced study of subjective and objective picture quality. IEEE Trans. Image Process. 25(1), 372–387 (2015)
Article MathSciNet Google Scholar
Golestaneh, S.A., Dadsetan, S., Kitani, K.M.: No-reference image quality assessment via transformers, relative ranking, and self-consistency. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1220–1230 (2022)
Google Scholar
Hosu, V., Lin, H., Sziranyi, T., Saupe, D.: KonIQ-10k: an ecologically valid database for deep learning of blind image quality assessment. IEEE Trans. Image Process. 29, 4041–4056 (2020)
Article Google Scholar
Ke, J., Wang, Q., Wang, Y., Milanfar, P., Yang, F.: MUSIQ: multi-scale image quality transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5148–5157 (2021)
Google Scholar
Larson, E.C., Chandler, D.M.: Most apparent distortion: full-reference image quality assessment and the role of strategy. J. Electron. Imaging 19(1), 011006 (2010)
Article Google Scholar
Li, S., Sun, L., Li, Q.: CLIP-ReID: exploiting vision-language model for image re-identification without concrete text labels. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 1405–1413 (2023)
Google Scholar
Lin, H., Hosu, V., Saupe, D.: KADID-10k: a large-scale artificially distorted IQA database. In: 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–3. IEEE (2019)
Google Scholar
Ma, K., Liu, W., Zhang, K., Duanmu, Z., Wang, Z., Zuo, W.: End-to-end blind image quality assessment using deep neural networks. IEEE Trans. Image Process. 27(3), 1202–1213 (2017)
Article MathSciNet Google Scholar
Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 21(12), 4695–4708 (2012)
Article MathSciNet Google Scholar
Pan, Z., et al.: DACNN: blind image quality assessment via a distortion-aware convolutional neural network. IEEE Trans. Circuits Syst. Video Technol. 32(11), 7518–7531 (2022)
Article MathSciNet Google Scholar
Ponomarenko, N., et al.: Image database TID2013: peculiarities, results and perspectives. Sig. Process. Image Commun. 30, 57–77 (2015)
Article Google Scholar
Qin, G., et al.: Data-efficient image quality assessment with attention-panel decoder. ar**v preprint ar**v:2304.04952 (2023)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Google Scholar
Sheikh, H.R., Sabir, M.F., Bovik, A.C.: A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans. Image Process. 15(11), 3440–3451 (2006)
Article Google Scholar
Su, S., et al.: Blindly assess image quality in the wild guided by a self-adaptive hyper network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3667–3676 (2020)
Google Scholar
Tsai, M.F., Liu, T.Y., Qin, T., Chen, H.H., Ma, W.Y.: FRank: a ranking method with fidelity loss. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 383–390 (2007)
Google Scholar
Wang, J., Chan, K.C., Loy, C.C.: Exploring clip for assessing the look and feel of images. ar** the perceptual space of picture quality. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3575–3585 (2020)
Google Scholar
You, J., Korhonen, J.: Transformer for image quality assessment. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 1389–1393. IEEE (2021)
Google Scholar
Zhang, L., Zhang, L., Bovik, A.C.: A feature-enriched completely blind image quality evaluator. IEEE Trans. Image Process. 24(8), 2579–2591 (2015)
Article MathSciNet Google Scholar
Zhang, W., Ma, K., Yan, J., Deng, D., Wang, Z.: Blind image quality assessment using a deep bilinear convolutional neural network. IEEE Trans. Circuits Syst. Video Technol. 30(1), 36–47 (2018)
Article Google Scholar
Zhang, W., Zhai, G., Wei, Y., Yang, X., Ma, K.: Blind image quality assessment via vision-language correspondence: A multitask learning perspective. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14071–14081 (2023)
Google Scholar
Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Learning to prompt for vision-language models. Int. J. Comput. Vision 130(9), 2337–2348 (2022)
Article Google Scholar
Zhu, H., Li, L., Wu, J., Dong, W., Shi, G.: MetaIQA: deep meta-learning for no-reference image quality assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14143–14152 (2020)
Google Scholar

Download references

Acknowledgement

This work was supported by National Key R &D Program of China (No. 2022ZD0118202), the National Science Fund for Distinguished Young Scholars (No. 62025603), the National Natural Science Foundation of China (No. U21B2037, No. U22B2051, No. 62176222, No. 62176223, No. 62176226, No. 62072386, No. 62072387, No. 62072389, No. 62002305 and No. 62272401), and the Natural Science Foundation of Fujian Province of China (No. 2021J01002, No. 2022J06001).

Author information

Authors and Affiliations

Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, **amen University, **amen, 361005, People’s Republic of China
Wensheng Pan, Yan Zhang & **yang Dai
School of Informatics, **amen University, **amen, 361005, People’s Republic of China
Wensheng Pan, Zhifu Yang, DingMing Liu, Chenxin Fang & **yang Dai
Institute of Artificial Intelligence, **amen University, **amen, 361005, People’s Republic of China
Yan Zhang

Authors

Wensheng Pan
View author publications
You can also search for this author in PubMed Google Scholar
Zhifu Yang
View author publications
You can also search for this author in PubMed Google Scholar
DingMing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chenxin Fang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
**yang Dai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Zhang .

Editor information

Editors and Affiliations

Nan**g University of Information Science and Technology, Nan**g, China
Qingshan Liu
**amen University, **amen, China
Hanzi Wang
Bei**g University of Posts and Telecommunications, Bei**g, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Bei**g, China
Hongbin Zha
Chinese Academy of Sciences, Bei**g, China
**lin Chen
Chinese Academy of Sciences, Bei**g, China
Liang Wang
**amen University, **amen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pan, W., Yang, Z., Liu, D., Fang, C., Zhang, Y., Dai, P. (2024). Quality-Aware CLIP for Blind Image Quality Assessment. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14430. Springer, Singapore. https://doi.org/10.1007/978-981-99-8537-1_32

Download citation

DOI: https://doi.org/10.1007/978-981-99-8537-1_32
Published: 26 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8536-4
Online ISBN: 978-981-99-8537-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Quality-Aware CLIP for Blind Image Quality Assessment