Quality-Aware CLIP for Blind Image Quality Assessment

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14430))

Included in the following conference series:

  • 610 Accesses

Abstract

Blind Image Quality Assessment (BIQA) aims to simulate human perception of image quality without reference images. Pretrained visual-linguistic models, like CLIP, have shown excellent performance in various visual tasks and have been successfully applied in BIQA. However, existing CLIP-based approaches typically employ a coarse classification method, dividing images into two or five quality levels based on CLIP’s text-image comparison ability. In this work, we propose a novel approach for BIQA that introduces a fine-grained quality-level stratification strategy. This strategy enables a more precise assessment of image quality across a wider range of levels. Additionally, we present a two-stage training model called Quality-Aware CLIP (QA-CLIP). In the first stage, we leverage a set of learnable text tokens to optimize the text description and fully utilize the representation capabilities of CLIP’s text encoder. In the second stage, we further optimize the image encoder and quality-aware block to capture features that are highly relevant to perceived quality. Experimental results demonstrate that QA-CLIP achieves comparable performance with state-of-the-art methods on various synthetic and real datasets. Notably, in CSIQ, TID2013, and KADID datasets, QA-CLIP outperforms the state-of-the-art by 1.2%, 4.7%, and 4.8% respectively in terms of Spearman Rank Correlation Coefficient (SRCC).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 55.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 69.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gao, P., et al.: CLIP-adapter: better vision-language models with feature adapters. ar**v preprint ar**v:2110.04544 (2021)

  2. Ghadiyaram, D., Bovik, A.C.: Massive online crowdsourced study of subjective and objective picture quality. IEEE Trans. Image Process. 25(1), 372–387 (2015)

    Article  MathSciNet  Google Scholar 

  3. Golestaneh, S.A., Dadsetan, S., Kitani, K.M.: No-reference image quality assessment via transformers, relative ranking, and self-consistency. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1220–1230 (2022)

    Google Scholar 

  4. Hosu, V., Lin, H., Sziranyi, T., Saupe, D.: KonIQ-10k: an ecologically valid database for deep learning of blind image quality assessment. IEEE Trans. Image Process. 29, 4041–4056 (2020)

    Article  Google Scholar 

  5. Ke, J., Wang, Q., Wang, Y., Milanfar, P., Yang, F.: MUSIQ: multi-scale image quality transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5148–5157 (2021)

    Google Scholar 

  6. Larson, E.C., Chandler, D.M.: Most apparent distortion: full-reference image quality assessment and the role of strategy. J. Electron. Imaging 19(1), 011006 (2010)

    Article  Google Scholar 

  7. Li, S., Sun, L., Li, Q.: CLIP-ReID: exploiting vision-language model for image re-identification without concrete text labels. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 1405–1413 (2023)

    Google Scholar 

  8. Lin, H., Hosu, V., Saupe, D.: KADID-10k: a large-scale artificially distorted IQA database. In: 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–3. IEEE (2019)

    Google Scholar 

  9. Ma, K., Liu, W., Zhang, K., Duanmu, Z., Wang, Z., Zuo, W.: End-to-end blind image quality assessment using deep neural networks. IEEE Trans. Image Process. 27(3), 1202–1213 (2017)

    Article  MathSciNet  Google Scholar 

  10. Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 21(12), 4695–4708 (2012)

    Article  MathSciNet  Google Scholar 

  11. Pan, Z., et al.: DACNN: blind image quality assessment via a distortion-aware convolutional neural network. IEEE Trans. Circuits Syst. Video Technol. 32(11), 7518–7531 (2022)

    Article  MathSciNet  Google Scholar 

  12. Ponomarenko, N., et al.: Image database TID2013: peculiarities, results and perspectives. Sig. Process. Image Commun. 30, 57–77 (2015)

    Article  Google Scholar 

  13. Qin, G., et al.: Data-efficient image quality assessment with attention-panel decoder. ar**v preprint ar**v:2304.04952 (2023)

  14. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  15. Sheikh, H.R., Sabir, M.F., Bovik, A.C.: A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans. Image Process. 15(11), 3440–3451 (2006)

    Article  Google Scholar 

  16. Su, S., et al.: Blindly assess image quality in the wild guided by a self-adaptive hyper network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3667–3676 (2020)

    Google Scholar 

  17. Tsai, M.F., Liu, T.Y., Qin, T., Chen, H.H., Ma, W.Y.: FRank: a ranking method with fidelity loss. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 383–390 (2007)

    Google Scholar 

  18. Wang, J., Chan, K.C., Loy, C.C.: Exploring clip for assessing the look and feel of images. ar** the perceptual space of picture quality. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3575–3585 (2020)

    Google Scholar 

  19. You, J., Korhonen, J.: Transformer for image quality assessment. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 1389–1393. IEEE (2021)

    Google Scholar 

  20. Zhang, L., Zhang, L., Bovik, A.C.: A feature-enriched completely blind image quality evaluator. IEEE Trans. Image Process. 24(8), 2579–2591 (2015)

    Article  MathSciNet  Google Scholar 

  21. Zhang, W., Ma, K., Yan, J., Deng, D., Wang, Z.: Blind image quality assessment using a deep bilinear convolutional neural network. IEEE Trans. Circuits Syst. Video Technol. 30(1), 36–47 (2018)

    Article  Google Scholar 

  22. Zhang, W., Zhai, G., Wei, Y., Yang, X., Ma, K.: Blind image quality assessment via vision-language correspondence: A multitask learning perspective. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14071–14081 (2023)

    Google Scholar 

  23. Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Learning to prompt for vision-language models. Int. J. Comput. Vision 130(9), 2337–2348 (2022)

    Article  Google Scholar 

  24. Zhu, H., Li, L., Wu, J., Dong, W., Shi, G.: MetaIQA: deep meta-learning for no-reference image quality assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14143–14152 (2020)

    Google Scholar 

Download references

Acknowledgement

This work was supported by National Key R &D Program of China (No. 2022ZD0118202), the National Science Fund for Distinguished Young Scholars (No. 62025603), the National Natural Science Foundation of China (No. U21B2037, No. U22B2051, No. 62176222, No. 62176223, No. 62176226, No. 62072386, No. 62072387, No. 62072389, No. 62002305 and No. 62272401), and the Natural Science Foundation of Fujian Province of China (No. 2021J01002, No. 2022J06001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yan Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pan, W., Yang, Z., Liu, D., Fang, C., Zhang, Y., Dai, P. (2024). Quality-Aware CLIP for Blind Image Quality Assessment. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14430. Springer, Singapore. https://doi.org/10.1007/978-981-99-8537-1_32

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8537-1_32

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8536-4

  • Online ISBN: 978-981-99-8537-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation