Abstract
With the rise of deep learning algorithms nowadays, scene image representation methods have achieved a significant performance boost, particularly in accuracy, in classification. However, the performance is still limited because the scene images are mostly complex having higher intra-class dissimilarity and inter-class similarity problems. To deal with such problems, there have been several methods proposed in the literature with their advantages and limitations. A detailed study of previous works is necessary to understand their advantages and disadvantages in image representation and classification problems. In this paper, we review the existing scene image representation methods that are being widely used for image classification. For this, we, first, devise the taxonomy using the seminal existing methods proposed in the literature to this date using deep learning (DL)-based, computer vision (CV)-based, and search engine (SE)-based methods. Next, we compare their performance both qualitatively (e.g., quality of outputs, pros/cons, etc.) and quantitatively (e.g., accuracy). Last, we speculate on the prominent research directions in scene image representation tasks using keyword growth and timeline analysis. Overall, this survey provides in-depth insights and applications of recent scene image representation methods under three different methods.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-15005-9/MediaObjects/11042_2023_15005_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-15005-9/MediaObjects/11042_2023_15005_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-15005-9/MediaObjects/11042_2023_15005_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-15005-9/MediaObjects/11042_2023_15005_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-15005-9/MediaObjects/11042_2023_15005_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-15005-9/MediaObjects/11042_2023_15005_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-15005-9/MediaObjects/11042_2023_15005_Fig7_HTML.png)
Similar content being viewed by others
Data Availability
All data are publicly available.
References
Ali N, Zafar B, Riaz F, Dar SH, Ratyal NI, Bajwa KB, Iqbal MK, Sajid M (2018) A hybrid geometric spatial image representation for scene classification. PloS ONE 13(9):e0203,339
Anu E, Anu K (2016) A survey on scene recognition. Int J Sci Eng Technol Res(IJSETR) 5:64–68
Aria M, Cuccurullo C (2017) Bibliometrix: an r-tool for comprehensive science map** analysis. J Informetrics 11(4):959–975
Ayalew AM, Salau AO, Abeje BT, Enyew B (2022) Detection and classification of covid-19 disease from x-ray images using convolutional neural networks and histogram of oriented gradients. Biomed Signal Process Control 74:103,530
Bai S (2017) Growing random forest on deep convolutional neural networks for scene categorization. Expert Syst Appl 71:279–287
Bai S, Tang H, An S (2019) Coordinate cnns and lstms to categorize scene images with multi-views and multi-levels of abstraction. Expert Syst Appl 120:298–309
Banerji S, Sinha A, Liu C (2012) Novel color, shape and texture-based scene image descriptors. In: 2012 IEEE 8th International conference on intelligent computer communication and processing, IEEE, pp 245–248
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146
Bosch A, Zisserman A, Muñoz X (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans Pattern Anal Mach Intell 30(4):712–727
Chen G, Song X, Zeng H, Jiang S (2020) Scene recognition with prototype-agnostic scene layout. IEEE Trans Image Process 29:5877–5888
Chen H, **e K, Wang H, Zhao C (2018) Scene image classification using locality-constrained linear coding based on histogram intersection. Multimed Tools Appl 77(3):4081–4092
Cheng X, Lu J, Feng J, Yuan B, Zhou J (2018) Scene recognition with objectness. Pattern Recognit 74:474–487
Cho WS, Lam KM (2012) An efficient and effective hybrid pyramid kernel for un-segmented image classification. In: 2012 International conference on systems and informatics (ICSAI2012), IEEE, pp 2153–2158
Choe S, Seong H, Kim E (2021) Indoor place category recognition for a cleaning robot by fusing a probabilistic approach and deep learning. IEEE Transactions on Cybernetics
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proc. ieee comput. soc. conf. comput. vis. pattern recognit. (CVPR), pp 886–893
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. CVPR
Dixit M, Chen S, Gao D, Rasiwasia N, Vasconcelos N (2015) Scene classification with semantic fisher vectors. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp 2974–2983
Dixit M, Li Y, Vasconcelos N (2019) Semantic fisher scores for task transfer: using objects to classify scenes. IEEE Trans Pattern Anal Mach Intell 42 (12):3102–3118
Dutta R, Aryal J, Das A, Kirkpatrick JB (2013) Deep cognitive imaging systems enable estimation of continental-scale fire incidence from climate data. Sci Rep 3(1):1–4
Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. In: Proc. IEEE comput. Soc. Conf. Comput. Vis. and pattern recognit. (CVPR), vol 2, pp 524–531
Fornoni M, Caputo B (2014) Scene recognition with naive bayes non-linear learning. In: 2014 22nd International conference on pattern recognition, IEEE, pp 3404–3409
Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: Proc. Eur. Conf. Comput. Vis. (ECCV), pp 392–407
Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset
Guo S, Huang W, Wang L, Qiao Y (2017) Locally supervised deep hybrid model for scene recognition. IEEE Trans Image Process 26(2):808–820
Guo Y, Lew MS (2016) Bag of Surrogate parts: one inherent feature of deep cnns. In: Proc. of the British Machine Vision Conference (BMVC)
Gupta S, Dileep AD, Thenkanidiyoor V (2021) Recognition of varying size scene images using semantic analysis of deep activation maps. Mach Vis Appl 32(2):1–19
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proc. IEEE conf. comput. vis. pattern recognit. (CVPR), pp 770–778
Hu J, Guo P (2012) Spatial local binary patterns for scene image classification. In: 2012 6Th international conference on sciences of electronics, technologies of information and telecommunications, SETIT, IEEE, pp 326–330
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proc. 22nd ACM Int. Conf. on Multimedia (ACMM), pp 675–678
Jiang S, Chen G, Song X, Liu L (2019) Deep patch representations with shared codebook for scene classification. ACM Trans Multimed Comput Commun Appl 15(1s):1–17
Juneja M, Vedaldi A, Jawahar C, Zisserman A (2013) Blocks that shout: distinctive parts for scene classification. In: Proc. IEEE conf. comput. vis. pattern recognit. (CVPR), pp 923–930
Khan A, Chefranov A, Demirel H (2021) Image scene geometry recognition using low-level features fusion at multi-layer deep cnn. Neurocomputing 440:111–126
Khan SH, Hayat M, Bennamoun M, Togneri R, Sohel FA (2016) A discriminative representation of convolutional features for indoor scene recognition. IEEE Trans Image Process 25(7):3372–3383
Kim Y (2014) Convolutional neural networks for sentence classification. ar**v:14085882
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proc. adv. neural inf. process. syst. (NIPS), pp 1097–1105
Kuzborskij I, Maria Carlucci F, Caputo B (2016) When naive bayes nearest neighbors meet convolutional neural networks. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp 2100–2109
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proc. IEEE comput. soc. conf. comput. vis. pattern recognit. (CVPR), pp 2169–2178
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Li LJ, Li FF (2007) What, where and who? classifying events by scene and object recognition. In: Proc. 11th int. Conf. Comput. Vis. (ICCV), vol 2, p 6
Li Q, Qin Z, Chai L, Zhang H, Guo J, Bhanu B (2013) Representative reference-set and betweenness centrality for scene image categorization. In: 2013 IEEE International conference on image processing, IEEE, pp 3254–3258
Li LJ, Su H, Fei-Fei L, **ng EP (2010) Object bank: a high-level image representation for scene classification & semantic feature sparsification. In: Proc. adv. neural inf. process. syst. (NIPS), pp 1378–1386
Li Q, Zhang H, Guo J, Bhanu B, An L (2012) Reference-based scheme combined with k-svd for scene image categorization. IEEE Signal Process Lett 20(1):67–70
Lin C, Lee F, Cai J, Chen H, Chen Q (2021) Global and graph encoded local discriminative region representation for scene recognition. Comput Model Eng Sci 128(3):985–1006
Lin D, Lu C, Liao R, Jia J (2014) Learning important spatial pooling regions for scene classification. In: Proc. IEEE conf. comput. vis. pattern recognit. (CVPR), pp 3726–3733
Lin TY, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: Proc. IEEE int. conf. comput. vis. (ICCV), pp 1449–1457
Lin TYY, RoyChowdhury A, Maji S (2018) Bilinear convolutional neural networks for fine-grained visual recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1309–1322
Liu W, Li Y, Wu Q (2018) An attribute-based high-level image representation for scene classification. IEEE Access 7:4629–4640
Liu S, Tian G (2019) An indoor scene classification method for service robot based on cnn feature. J Robot 2019
Liu S, Tian G, Xu Y (2019) A novel scene classification model combining resnet based transfer learning and data augmentation with a filter. Neurocomputing 338:191–206
Liu S, Tian G, Zhang Y, Duan P (2021) Scene recognition mechanism for service robot adapting various families: a cnn-based approach using multi-type cameras. IEEE Trans Multimed 24:2392–2406
Lopez-Cifuentes A, Escudero-Vinolo M, Bescos J, Garcia-Martin A (2020) Semantic-aware scene recognition. Pattern Recognit 102:107,256
Lowe DG (1999) Object recognition from local scale-invariant features, vol 2, pp 1150–1157
Margolin R, Zelnik-Manor L, Tal A (2014) OTC: a novel local descriptor for scene classification. In: Proc. Eur. Conf. Comput. Vis. (ECCV), pp 377–391
McCulloch WS, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5(4):115–133
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. ar**v:13013781
Moller T, Machiraju R, Mueller K, Yagel R (1997) Evaluation and design of filters using a taylor series expansion. IEEE Trans Vis Comput Graph 3(2):184–199
Nascimento G, Laranjeira C, Braz V, Lacerda A, Nascimento ER (2017) A robust indoor scene recognition method based on sparse representation. ar**v:1708.07555
Neupane B, Horanont T, Aryal J (2021) Deep learning-based semantic segmentation of urban features in satellite images: a review and meta-analysis. Remote Sens 13(4):808
Niu Z, Zhou Y, Shi K (2010) A hybrid image representation for indoor scene classification. In: 2010 25th International conference of image and vision computing New Zealand, IEEE, pp 1–7
Oliva A (2005) Gist of the scene. In: Neurobiology of attention, Elsevier, pp 251–256
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Parizi N, Oberlin JG, Felzenszwalb PF (2012) Reconfigurable models for scene recognition. In: Proc. comput. vis. pattern recognit. (CVPR), pp 2775–2782
Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proc. conf. on empirical methods in natural language processing (EMNLP), pp 1532–1543
Perronnin F, Sanchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: Proc. Eur. Conf. Comput. Vis. (ECCV), pp 143–156
Qi M, Wang Y (2016) Deep-cssr: scene classification using category-specific salient region with deep features. In: 2016 IEEE International Conference on Image Processing (ICIP), IEEE, pp 1047–1051
Quattoni A, Torralba A (2009) Recognizing indoor scenes. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp 413–420
Rasiwasia N, Vasconcelos N (2008) Scene classification with low-dimensional semantic spaces and weak supervision. In: IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp 1–6
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Ren X, Bo L, Fox D (2012) Rgb-(d) scene labeling: features and algorithms. In: 2012 IEEE Conference on computer vision and pattern recognition, IEEE, pp 2759–2766
Ringnér M (2008) What is principal component analysis? Nat Biotechnol 26(3):303–304
Sȧnchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245
Shadman Roodposhti M, Aryal J, Lucieer A, Bryan BA (2019) Uncertainty assessment of hyperspectral image classification: deep learning vs. random forest. Entropy 21(1):78
Shahi TB, Sitaula C (2021) Natural language processing for nepali text: a review. Artif Intell Rev 1–29
Shahi TB, Sitaula C, Neupane A, Guo W (2022) Fruit classification using attention-based mobilenetv2 for industrial applications. PloS ONE 17(2):e0264,586
Sharma K, Gupta S, Dileep AD, Rameshan R (2018) Scene image classification using reduced virtual feature representation in sparse framework. In: 2018 IEEE International conference on acoustics, speech and signal processing, ICASSP, IEEE, pp 2701–2705
ShenghuaGao IH, Liang-TienChia P (2010) Local features are not lonely–Laplacian sparse coding for image classification. In: Proc. IEEE conf. comput. vis. pattern recognit. (CVPR), pp 3555–3561
Silberman N, Fergus R (2011) Indoor scene segmentation using a structured light sensor. In: 2011 IEEE International conference on computer vision workshops, ICCV workshops, IEEE, pp 601–608
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. ar**v:1409.1556
Singh V, Girish D, Ralescu A (2017) Image understanding-a brief review of scene classification and recognition. In: Proc. modern artificial intelligence and cognitive science (MAICS), pp 85–91
Sinha A, Banerji S, Liu C (2012) Novel gabor-phog features for object and scene image classification. In: Joint IAPR international workshops on statistical techniques in pattern recognition (SPR) and structural and syntactic pattern recognition, SSPR, Springer, pp 584–592
Sinha A, Banerji S, Liu C (2014) Scene image classification using a wigner-based local binary patterns descriptor. In: 2014 International joint conference on neural networks, IJCNN, IEEE, pp 1614–1621
Sinha A, Banerji S, Liu C (2014) New color gphog descriptors for object and scene image classification. Mach Vis Appl 25(2):361–375
Sitaula C, Aryal S, **ang Y, Basnet A, Lu X (2021b) Content and context features for scene image representation. Knowledge-Based Systems 107470
Sitaula C, Basnet A, Mainali A, Shahi T (2021) Deep learning-based methods for sentiment analysis on nepali covid-19-related tweets. Computational Intelligence and Neuroscience 2021
Sitaula C, Shahi TB (2022) Monkeypox virus detection using pre-trained deep learning-based approaches. J Med Syst 46(11):1–9
Sitaula C, **ang Y, Aryal S, Lu X (2019) Unsupervised deep features for privacy image classification. In: Proc. pacific-rim symposium on image and video technology (PSIVT), pp 404–415
Sitaula C, **ang Y, Aryal S, Lu X (2021a) Scene image representation by foreground, background and hybrid features. Expert Systems with Applications 115285
Sitaula C, **ang Y, Basnet A, Aryal S, Lu X (2019) Tag-based semantic features for scene image classification. In: Proc. int. conf. on neural inf. process. (ICONIP), pp 90–102
Sitaula C, **ang Y, Basnet A, Aryal S, Lu X (2020) HDF: hybrid deep features for scene image representation. In: Proc. int. joint conf. on neural networks (IJCNN
Sitaula C, **ang Y, Zhang Y, Lu X, Aryal S (2019) Indoor image representation by high-level semantic features. IEEE Access 7:84,967–84,979
Sorkhi AG, Hassanpour H, Fateh M (2020) A comprehensive system for image scene classification. Multimed Tools Appl 1–26
Sun N, Li W, Liu J, Han G, Wu C (2018) Fusing object semantics and deep appearance features for scene recognition. IEEE Trans Circ Syst Video Technol 29(6):1715–1728
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. In: Proc. IEEE conf. comput. vis. pattern recognit. (CVPR), pp 1–9
Tang P, Wang H, Kwong S (2017) G-ms2f: Googlenet based multi-stage feature fusion of deep cnn for scene recognition. Neurocomputing 225:188–197
Van Gemert JC, Veenman CJ, Smeulders AW, Geusebroek JM (2009) Visual word ambiguity. IEEE Trans Pattern Anal Mach Intell 32(7):1271–1283
Wang D, Mao K (2019) Learning semantic text features for web text-aided image classification. IEEE Trans Multimed 21(12):2985–2996
Wang D, Mao K (2019) Task-generic semantic convolutional neural network for web text-aided image classification. Neurocomputing 329:103–115
Wang C, Peng G, De Baets B (2022) Joint global metric learning and local manifold preservation for scene recognition. Inf Sci 610:938–956
Wang J, Wang W, Wang R, Gao W (2016) Csps: an adaptive pooling method for image classification. IEEE Trans Multimed 18(6):1000–1010
Wang Z, Wang L, Wang Y, Zhang B, Qiao Y (2017) Weakly supervised patchnets: describing and aggregating local patches for scene recognition. IEEE Trans Image Process 26(4):2028–2041
Wei X, Phung SL, Bouzerdoum A (2016) Visual descriptors for scene categorization: experimental evaluation. Artif Intell Rev 45(3):333–368
Wu J, Rehg JM (2011) Centrist: a visual descriptor for scene categorization. IEEE Trans Pattern Anal Mach Intell 33(8):1489–1501
Wu R, Wang B, Wang W, Yu Y (2015) Harvesting discriminative meta objects with deep cnn features for scene classification. In: Proceedings of the IEEE international conference on computer vision, pp 1287–1295
**ao J, Hays J, Ehinger KA, Oliva A, Torralba A (2010) Sun database: large-scale scene recognition from abbey to zoo. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp 3485–3492
**ao Y, Wu J, Yuan J (2014) Mcentrist: a multi-channel feature generation mechanism for scene categorization. IEEE Trans Image Process 23 (2):823–836
**e GS, ** XB, Zhang XY, Zang SF, Yang C, Wang Z, Pu J (2018) From class-specific to class-mixture: cascaded feature representations via restricted boltzmann machine learning. IEEE Access 6:69,393–69,406
**e L, Lee F, Liu L, Kotani K, Chen Q (2020) Scene recognition: a comprehensive survey. Pattern Recognit 107205
**e L, Lee F, Liu L, Yin Z, Yan Y, Wang W, Zhao J, Chen Q (2018) Improved spatial pyramid matching for scene recognition. Pattern Recogn 82:118–129
Yang S, Ramanan D (2015) Multi-scale recognition with DAG-CNNs. In: Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp 1215–1223
Zabih R, Woodfill J (1994) Non-parametric local transforms for computing visual correspondence. In: Proc. euro. conf. comput. vis. (ECCV), pp 151–158
Zeglazi O, Amine A, Rziza M (2016) Sift descriptors modeling and application in texture image classification. In: Proc. 13th int. Conf. Comput, Graphics, Imaging and Visualization (CGiV), pp 265–268
Zhang C, Cheng J, Liu J, Pang J, Liang C, Huang Q, Tian Q (2014) Object categorization in sub-semantic space. Neurocomputing 142:248–255
Zhang C, Liu J, Tian Q, Liang C, Huang Q (2013) Beyond visual features: a weak semantic image representation using exemplar classifiers for classification. Neurocomputing 120:318–324
Zhang B, Wang Q, Lu X, Wang F, Li P (2020) Locality-constrained affine subspace coding for image classification and retrieval. Pattern Recognit 100:107,167
Zhang L, Zhen X, Shao L (2014) Learning object-to-class kernels for scene classification. IEEE Trans Image Process 23(8):3241–3253
Zhang C, Zhu G, Huang Q, Tian Q (2017) Image classification by search with explicitly and implicitly semantic representations. Inf Sci 376:125–135
Zhou B, Khosla A, Lapedriza A, Torralba A, Oliva A (2016) Places: an image database for deep scene understanding. ar**v:161002055
Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A (2017) Places: a 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1452–1464
Zhu J, Li Lj, Fei-Fei L, **ng EP (2010) Large margin learning of upstream scene understanding models. In: Proc. Adv. Neural Inf. Process. Syst. (NIPS), pp 2586–2594
Zuo Z, Wang G, Shuai B, Zhao L, Yang Q (2015) Exemplar based deep discriminative and shareable feature learning for scene image classification. Pattern Recogn 48(10):3004–3015
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sitaula, C., Shahi, T.B., Marzbanrad, F. et al. Recent advances in scene image representation and classification. Multimed Tools Appl 83, 9251–9278 (2024). https://doi.org/10.1007/s11042-023-15005-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15005-9