Abstract
Blind Image Quality Assessment (BIQA) aims to simulate human perception of image quality without reference images. Pretrained visual-linguistic models, like CLIP, have shown excellent performance in various visual tasks and have been successfully applied in BIQA. However, existing CLIP-based approaches typically employ a coarse classification method, dividing images into two or five quality levels based on CLIP’s text-image comparison ability. In this work, we propose a novel approach for BIQA that introduces a fine-grained quality-level stratification strategy. This strategy enables a more precise assessment of image quality across a wider range of levels. Additionally, we present a two-stage training model called Quality-Aware CLIP (QA-CLIP). In the first stage, we leverage a set of learnable text tokens to optimize the text description and fully utilize the representation capabilities of CLIP’s text encoder. In the second stage, we further optimize the image encoder and quality-aware block to capture features that are highly relevant to perceived quality. Experimental results demonstrate that QA-CLIP achieves comparable performance with state-of-the-art methods on various synthetic and real datasets. Notably, in CSIQ, TID2013, and KADID datasets, QA-CLIP outperforms the state-of-the-art by 1.2%, 4.7%, and 4.8% respectively in terms of Spearman Rank Correlation Coefficient (SRCC).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gao, P., et al.: CLIP-adapter: better vision-language models with feature adapters. ar**v preprint ar**v:2110.04544 (2021)
Ghadiyaram, D., Bovik, A.C.: Massive online crowdsourced study of subjective and objective picture quality. IEEE Trans. Image Process. 25(1), 372–387 (2015)
Golestaneh, S.A., Dadsetan, S., Kitani, K.M.: No-reference image quality assessment via transformers, relative ranking, and self-consistency. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1220–1230 (2022)
Hosu, V., Lin, H., Sziranyi, T., Saupe, D.: KonIQ-10k: an ecologically valid database for deep learning of blind image quality assessment. IEEE Trans. Image Process. 29, 4041–4056 (2020)
Ke, J., Wang, Q., Wang, Y., Milanfar, P., Yang, F.: MUSIQ: multi-scale image quality transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5148–5157 (2021)
Larson, E.C., Chandler, D.M.: Most apparent distortion: full-reference image quality assessment and the role of strategy. J. Electron. Imaging 19(1), 011006 (2010)
Li, S., Sun, L., Li, Q.: CLIP-ReID: exploiting vision-language model for image re-identification without concrete text labels. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 1405–1413 (2023)
Lin, H., Hosu, V., Saupe, D.: KADID-10k: a large-scale artificially distorted IQA database. In: 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–3. IEEE (2019)
Ma, K., Liu, W., Zhang, K., Duanmu, Z., Wang, Z., Zuo, W.: End-to-end blind image quality assessment using deep neural networks. IEEE Trans. Image Process. 27(3), 1202–1213 (2017)
Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 21(12), 4695–4708 (2012)
Pan, Z., et al.: DACNN: blind image quality assessment via a distortion-aware convolutional neural network. IEEE Trans. Circuits Syst. Video Technol. 32(11), 7518–7531 (2022)
Ponomarenko, N., et al.: Image database TID2013: peculiarities, results and perspectives. Sig. Process. Image Commun. 30, 57–77 (2015)
Qin, G., et al.: Data-efficient image quality assessment with attention-panel decoder. ar**v preprint ar**v:2304.04952 (2023)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Sheikh, H.R., Sabir, M.F., Bovik, A.C.: A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans. Image Process. 15(11), 3440–3451 (2006)
Su, S., et al.: Blindly assess image quality in the wild guided by a self-adaptive hyper network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3667–3676 (2020)
Tsai, M.F., Liu, T.Y., Qin, T., Chen, H.H., Ma, W.Y.: FRank: a ranking method with fidelity loss. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 383–390 (2007)
Wang, J., Chan, K.C., Loy, C.C.: Exploring clip for assessing the look and feel of images. ar** the perceptual space of picture quality. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3575–3585 (2020)
You, J., Korhonen, J.: Transformer for image quality assessment. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 1389–1393. IEEE (2021)
Zhang, L., Zhang, L., Bovik, A.C.: A feature-enriched completely blind image quality evaluator. IEEE Trans. Image Process. 24(8), 2579–2591 (2015)
Zhang, W., Ma, K., Yan, J., Deng, D., Wang, Z.: Blind image quality assessment using a deep bilinear convolutional neural network. IEEE Trans. Circuits Syst. Video Technol. 30(1), 36–47 (2018)
Zhang, W., Zhai, G., Wei, Y., Yang, X., Ma, K.: Blind image quality assessment via vision-language correspondence: A multitask learning perspective. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14071–14081 (2023)
Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Learning to prompt for vision-language models. Int. J. Comput. Vision 130(9), 2337–2348 (2022)
Zhu, H., Li, L., Wu, J., Dong, W., Shi, G.: MetaIQA: deep meta-learning for no-reference image quality assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14143–14152 (2020)
Acknowledgement
This work was supported by National Key R &D Program of China (No. 2022ZD0118202), the National Science Fund for Distinguished Young Scholars (No. 62025603), the National Natural Science Foundation of China (No. U21B2037, No. U22B2051, No. 62176222, No. 62176223, No. 62176226, No. 62072386, No. 62072387, No. 62072389, No. 62002305 and No. 62272401), and the Natural Science Foundation of Fujian Province of China (No. 2021J01002, No. 2022J06001).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Pan, W., Yang, Z., Liu, D., Fang, C., Zhang, Y., Dai, P. (2024). Quality-Aware CLIP for Blind Image Quality Assessment. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14430. Springer, Singapore. https://doi.org/10.1007/978-981-99-8537-1_32
Download citation
DOI: https://doi.org/10.1007/978-981-99-8537-1_32
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8536-4
Online ISBN: 978-981-99-8537-1
eBook Packages: Computer ScienceComputer Science (R0)