1 Introduction

The texture is the fundamental quantity of an image that aids in its identification. Texture analysis forms the foundation for computer vision problems like image recognition, image retrieval [37] and segmentation. Various images of satellite [30], forestry [27], medical [10], etc have been identifiable because of textures in them. The texture of an object provides important insights into the properties and behaviour of these objects. These insights later help in the computer vision tasks related to such objects when their shape doesn’t help. Texture today is one of the key components in the analysis of images. This makes the task of texture classification important. For the past years, there has been a lot of effort to develop models that can identify and classify these textures efficiently.

Classic machine learning approaches used for this task include using hand-engined features to extract information and using a statistical algorithm like SVM in the final layer for classification [43]. These approaches were previously preferred, but in recent times these approaches have been outperformed by deep learning methods, particularly convolutional neural networks. After the win of AlexNet [18] in the 2012 ImageNet large-scale visual recognition challenge, there has been an exponential growth in the usage of convolutional neural networks for image classification tasks. Today, significant models in computer vision for tasks like image classification, segmentation, recognition, etc., use convolutional neural networks.

CNN’s learn feature vectors with weight sharing and local connectivity, which detects patterns at all locations in the image. Initial layers of a CNN learn simple features like the edges, and the deeper ones learn more complex features. CNN’s can learn texture patterns of various complexity and scales. Novel convolution neural network models have better performance than the classic machine learning algorithms. This paper aims to propose models that would perform better than the previously proposed models and improvise the texture classification approach.

This paper proposes a transfer learning approach for the texture classification problem. Transfer learning is an approach wherein the intuition uses the knowledge gained while learning to classify classes of one dataset to a different data set of related problems. Transfer learning aims to focus on leveraging labelled data from one feature space to enhance the classification of other entirely different learning spaces. This approach works well when the source dataset (on which the model is trained) and the target dataset (the one in the study) are of a similar domain, making their feature spaces similar. In transfer learning, the top layer of the pre-trained model is replaced by a new layer with the number of neurons equal to the number of classes of the target dataset.

There are two types of transfer learning approaches. The first is feature extraction, wherein only the top layer is trained on the target dataset, freezing the rest of the dataset. The frozen layers are used as feature extractors on the target dataset, training only the top layer. The idea is that a feature vector trained on one kind of data set can extract valuable features on another data set. The second type is the fine-tuning of the model wherein only a few or none of the layers are frozen, and the rest of the layers along with the top layer are trained on the target dataset.

Transfer learning helps leverage the knowledge learnt by a model on one data set to extract information on another data set. Transfer learning also reduces the time of learning all the weights of the convolution layer. Using knowledge of a pre-trained model might also help in complete learning of the problem task compared to building a model from scratch. The pre-trained models used in this paper are MobileNetV3 and InceptionV3. The presented work focusses on:

  • Study about transfer learning on texture datasets.

  • Achieving better results on the provided benchmark datasets than previous work on the same datasets.

The rest of the paper is organised as follows. Section 2 discuss the literature survey of the related work. Section 4 cover the study of material and methods. Section 4 presents the experiments and results. At last, we are concluding work in Section 5.

2 Literature review

There has been a lot of research dedicated to texture analysis owing to the importance it holds in the field of computer vision. In 1993, [29] used two powerful algorithms, Principal Component Analysis and Multiscale Autoregressive models, on the Brodatz dataset. The variety of homogenous and non-homogenous images studied in this paper was more significant than those in the previous work. This approach got better results than the models proposed before it. In 1994 an energy-based approach was proposed in [38]. This model got an accuracy of over 90 for the classification of images.

Statistical methods are considered one of the earliest methods for texture analysis of the image, which have given good results on standard texture datasets. Ramola et al. [31] discusses the different statistical approaches like grey level concurrence matrix (GLCM), Local binary pattern(LBP), auto-correction function(ACF) and histogram pattern. Their research and discussion concluded that GCLM is the best approach for texture analysis. The major drawback of the GLCM model is the high matrix dimensionality and high correlation between harlick features. Feng et al. [9] and [5] have also implemented such statistical models on standard data sets and got good results.

Xu et al. [42] proposed a novel robust texture descriptor on variance in rotation, scale and illumination, which combines the dominant orientation analysis and multifractal analysis based on the Gabor filter. This approach was then implemented on the Brodatz and Outex datasets.

Sana and Islam [32] proposed power-law transform (PLT) to extract new spectral texture features. This technique outperformed the widely used Gabor features. As seen, machine learning approaches have had excellent results on standard datasets for texture analysis. However, these algorithms require handmade features for feature extraction. Also, such models cannot be used for feature extraction of images of another dataset, as seen in deep learning architectures with the help of transfer learning.

Zheng et al. [44] proposed an eight feature learning model alongside a deep learning perceptron based architecture. This paper showed the deep learning model’s advantage over the other model. In recent years, convolutional neural networks have surpassed the standard artificial neural networks in the field of computer vision. CNNs have also revolutionised other fields like natural language processing, image and video recognition, information retrieval, grayscale colourisation, and multi-dimensional data processing and have surpassed many machine learning algorithms. Y LeCun proposed CNNs, Boser [21] (1989), three decades ago but did not get popular then because of lack of data and computational power. Today there is abundant data available, the computational power of computers has drastically increased, and there has been a lot of development in develo** better optimisation algorithms. Algorithms like stochastic gradient descent with momentum (SGDM) and RMSprop have emerged as the favourites for optimisation. All these factors have contributed to the success of CNNs today[23]. Many CNN architectures such as AlexNet [17], VGG [16] uses the transfer learning technique to determine three different types of fruits and their relative freshness and got great results. Kundo et al. [19] proposes a bagging ensemble of three transfer learning models, InceptionV3, ResNet34 and DenseNet201, that outperformed the state of the art methods by 1.56%. Nadeem et al. [26] uses transfer learning for Pakistani traffic-sign recognition. They use a model trained on the German traffic-sign recognition, and with additional pre-processing and regularisation, they achieved competitive results on a small available dataset. This paper uses the approach of transfer learning. Transfer learning has also been widely employed in the medical domain. Arora et al. [3] used a transfer learning-based approach for detecting COVID-19 ailment in lung CT scan. They achieved a precision of 100% using the MobileNet architecture on the SARS-COV-2 CT-Scan dataset.

In recent years transformer-based architectures have revolutionised every domain of deep learning. A transformer-based architecture was originally proposed in [41] where authors proposed an attention mechanism based architecture, dispensing with recurrence and convolutions entirely. Their model was experimented on two translation tasks and outperformed the other models in terms of results and training time. Dosovitskiy et al. [7] proposed Vision Transformers(ViT) inspired from the transformer architectures for Natural Language Processing (NLP) tasks. Their study showed that ViT outperformed the conventional convolutional networks in terms of results and training time on standard datasets like the ImageNet.

The following sections of this paper discuss the materials and methods used and the experiments and results obtained. The last section summarises the paper and talks about the future scope.

3 Materials and methods

Figure 1 depicts the flowchart followed. The first step was to find the problem statement. The following step was to collect the related dataset to the problem statement. After the data was collected, it was preprocessed to make it of desirable format and size. The pre-processing stage also included data augmentation, which was done to avoid over-fitting the model. After pre-processing, models were designed for the problem statement, then tested on the pre-processed dataset. Transfer learning models are used to classify the different datasets collected. We use the MobileNetV3 and the InceptionV3 models for the classification task.

Fig. 1
figure 1

Flow graph

3.1 Dataset

We have used three standard benchmark datasets of the texture classification problem. These are the Brodatz dataset, Kylberg dataset and the Outex dataset. Below is the summary of these datasets.

3.1.1 Brodatz dataset

Brodatz dataset [4] is a very popular dataset for texture classification problems. The dataset has been referred from the University of Southern California[]. The original dataset did not contain the rotated images. In this paper, we have proposed these rotations using 40 different rotation angles on these images. This dataset has 112 classes. The samples of this dataset are displayed in Fig. 2. The summary of this dataset is given in Table 1.

Fig. 2
figure 2

Samples of the Brodatz dataset

Table 1 Summary of the Brodatz dataset

3.1.2 Kylberg dataset

The Kylberg dataset is another widely used dataset for texture classification problems. This dataset has 2 versions (1) with rotation patches and (2) without rotation patches [20]. We have used v1.0, which is the version without rotation patches. The classes of this dataset are blanket1, blanket2, canvas1, ceiling1, ceiling2, cushion1, floor1, floor2, grass1, lentils1, linseds1, oatmeal1, pearlsugar1, rice1, rice2, rug1, sand1, scarf1, scarf2, screen1,seat1, seat2, sesameseeds1, stone1, stone2,stone3, stoneslab1 and wall1. The samples of this dataset are displayed in Fig. 3. The summary of this dataset is given in Table 2.

Fig. 3
figure 3

Samples of the Kylberg dataset

Table 2 Summary of the Kylberg dataset

3.1.3 Outex dataset

The Outex [28] database has a lot of datasets. We are using the Outex_TC_00012 dataset of this database. We have referred to this dataset from the University of OULU. The classes of this dataset are canvas001, canvas002, canvas003, canvas005, canvas006, canvas009, canvas011, canvas021, canvas022, canvas023,canvas025,canvas026 ,canvas031 ,canvas032, canvas033, canvas035, canvas038,canvas039,tile005 ,tile006 ,carpet002 ,carpet004 ,carpet005 and carpet009. The samples of this dataset are displayed in Fig. 4. The summary of this dataset is given in Table 3.

Fig. 4
figure 4

Samples of the Outex dataset

Table 3 Summary of the Outex dataset

3.2 Data preprocessing and splitting

Data preprocessing is one of the most critical steps. This step makes the raw data compatible with the deep learning model. Images in the Outex and Brodatz datasets are in GIF format and converted to compatible models.

After the data is converted to a compatible format, the images are then resized to a size of 224*224*3, making it compatible with the pre-trained model. After preprocessing, the data is split. The Kylberg, Outex_TC_00012 and the brodatz dataset are split in a ratio of 80:20 into training and testing data.

3.3 Data augmentation

As discussed earlier, data augmentation refers to the act of creating more data out of the already existing data. The intuition is that the image of the surface of texture rotated by an angle or flipped along an axis remains the image of that surface. Since there is only 1 image available for each class in the Brodatz dataset, more images are produced by rotating the original images by different angles. Data augmentation is also done for all the data of all three datasets. Figure 5 shows a sample of data received after subjecting the data of the Kylberg dataset to data re-scaling and data augmentation. Figure 6 shows a sample of data received after subjecting the data of the Brodatz dataset to data re-scaling and data augmentation. Figure 7 shows a sample of data received after subjecting the data of the Outex dataset to data re-scaling and data augmentation.

Fig. 5
figure 5

Sample images of Kylberg data set after pre-processing and data augmenation

Fig. 6
figure 6

Sample images of Brodatz data set after pre-processing and data augmenation

Fig. 7
figure 7

Sample images of Outex_TC_00012 data set after pre-processing and data augmenation

3.4 Proposed model

This paper uses the methodology of using pre-trained models called transfer learning. Intuition uses the knowledge gained by a model on one problem to solve another similar problem. This methodology reduces the time spent on training a model from scratch. Also, using a pre-trained model might be able to learn the problem entirely compared to a model trained from scratch. This paper uses the MobileNetV3 and InceptionV3 models. For each approach, the last dense layer (classification layer) of the pre-trained model is replaced with a softmax layer suitable for classifying the texture of classes of that dataset. In this work, the following transfer learning techniques were implemented (Fig. 8):

  • Feature extraction: Here, we froze all the model layers and trained only the added dense layer. Here the pre-trained model is only used as a feature extractor for the classifier.

  • Full fine-tuning: Here, the whole pre-trained model was fine-tuned using the data in use.

Fig. 8
figure 8

Proposed model

3.4.1 Transfer learning

It becomes difficult to collect enough data to build a model from scratch in many world applications. In such scenarios, the idea of transfer learning comes in. As discussed earlier, transfer learning is an approach wherein a model trained on a vast data set is used to solve a related problem. In the medical domain, the number of samples is limited because the procedure of collecting the data is both expensive and complicated. In such situations, using a pre-trained model is more effective than training a model from scratch. One such example is breast cancer classification [34] where the goal is to classify whether a cancer is malignant or benign. The paper compared the results from a pre-trained model and a model trained from scratch. Results obtained by transfer learning surpassed those obtained by a model trained from scratch. In this paper, we have used TensorFlow Hub to import such pre-trained models without their top layers. A softmax layer is then added to these layers. For the Kylberg dataset, only the last layer is trained. For the Outex and brodatz datasets, the models were fine-tuned.


MobileNet was proposed by Sandler, Howard [33]. This model has achieved a great balance between performance and computation cost. MobileNet offers an extremely efficient network architecture that can easily match the requirements for mobile and embedded applications. This paper makes use of the MobileNetV3 small model, which was proposed in [12]. TensorFlow Hub is used to use the MobileNetV3 model, which has been trained on ImageNet (ILSVRC-2012-CLS) data. The model is used as feature extraction for the Kylberg dataset without tuning. The model is fully fine-tuned for Outex and the brodatz datasetsuned. Figure 9 summarises the MobileNetV3 architecture.

Fig. 9
figure 9

Summary of the MobileNetV3 model


InceptionV3 [40] is the third edition of Google’s Inception Convolutional Neural Network. The Inception modules are well-designed convolution modules that can generate discriminatory features and reduce the number of parameters. The InceptionV1 model was introduced at the 2014 ILSVRC classification challenge, where VGGNet [36] was also presented for the first time. Both gained similar results. However, Inception architecture had the advantage of performing well even under strict constraints on memory and computational budget.

The Inceptionv1 [39] model overcame the problem of variation of information by having different sizes of filters and a wider network. It is 22 layers deep (27, including the pooling layers). It uses global average pooling at the end of the last inception module. It is a deep network and is subject to the vanishing gradient problem. To prevent the middle part of the network from “dying out,” it uses two auxiliary classifiers.

Neural networks perform better when convolutions don’t alter the dimensions of the input drastically. Reducing the dimensions too much may cause loss of information, known as a “representational bottleneck.” InceptionV2 [40] model overcame this problem by expanding the filterbanks. InceptionV2 also used clever factorization methods to make the convolution more efficient in terms of computation complexity.

The InceptionV3 had all the upgrades that InceptionV2 had. In addition, it used RMSProp Optimizer, BatchNorm in the Auxillary Classifiers, and Label Smoothing to prevent overfitting. Figure 10 summarises the MobileNetV3 architecture.

Fig. 10
figure 10

Summary of the InceptionV3 model

4 Experiments and results

4.1 Hardware and software setup

Tesla K80 GPU and 13 GB RAM used for training along with TensorFlow, Keras, and Scikit-learn libraries in Google Colab, coded in Python 3.7.10.

4.2 Training and testing data

The Kylberg, Brodatz and the Outex datasets are split into training data (80%) and testing data (20%). Adam optimisation and categorical cross-entropy loss functions are used in all cases. A learning rate of 0.01 has been used. The batch size for the training was set to 32. The proposed model 1 for the Kylberg dataset is only fully trained on the training data. Rest in all other cases, the pre-trained model is used as a feature vector, and only the top added layer is trained on the training data.

4.3 Evaluation criteria

In the prediction phase, seven quantitative performance measures were computed to access the reliability of trained models using the validation data, including precision, recall, f1-score, accuracy, macro-avg, weighted-avg and Cohen kappa score. These metrics are computed based on True Positive (TP), True Negative (TN), False Positive (FP), False Negative (FN).

$$ \begin{array}{@{}rcl@{}} Precision & = &\frac{TP}{TP + FP} \end{array} $$
$$ \begin{array}{@{}rcl@{}} Recall& = &\frac{TP}{TP + FN} \end{array} $$
$$ \begin{array}{@{}rcl@{}} F1Score& = &2*\frac{Precision*Recall}{Precision+Recall} \end{array} $$
$$ \begin{array}{@{}rcl@{}} Accuracy& = &\frac{TP+TN}{TP+FN+TN+FP} \end{array} $$
$$ \begin{array}{@{}rcl@{}} Weighted avg & = & F1class1 \ast W1 + F1class2 \ast W2 + F1class3 \ast W3 + {\cdots} + F1classn \ast W n \end{array} $$

F1classm : F1 score of class m

$$ Macro avg = F1class1 + F1class2 + F1class3 + {\cdots} + F1classn $$

F1classm : F1 score of class m Cohen kappa score:

$$ K=\frac{p0-pe}{1-pe} $$

p0 = relative observed agreement among raters, pe = the hypothetical probability of chance agreement.

4.4 Training single convolution mode

All the images in the .gif or the .ras format were converted to a compatible format. After that, All the images of the three datasets in the study were rescaled to a size of 224*224. The images were then normalised to make the values of their pixels range from 0-1. The Kyllberg and Brodatz datasets were then subjected to data augmentation before passing them to the proposed model.

4.4.1 Kylberg dataset

The first dataset to be studied was the Kylberg dataset. The first model is developed using the MobileNetV3 small model, trained on the ImageNet dataset. The top layer of the pre-trained model is removed and replaced by a softmax layer with 28 classes. The model was fully fined tuned, i.e. all the model layers were trained on the training dataset. The proposed model was trained for 10 epochs on the training dataset. The model achieved an accuracy of 100% on the testing dataset. The classification report and confusion matrix of model 1 on testing it on testing data are shown in Table 4 and Fig. 11 respectively. The accuracy vs epochs graph and the loss vs epochs graph of model1 for the Kylberg dataset while training is shown in Fig. 12.

Table 4 Classification report for model 1 Kylberg dataset
Fig. 11
figure 11

Model 1 confusion matrix for the Kylberg datset

Fig. 12
figure 12

Model 1 accuracy and losses graph for the Kylberg datset

The second model is developed using the InceptionV3 model trained on the ImageNet dataset. The top layer of the pre-trained model is removed and replaced by a softmax layer with 28 classes. The pre-trained model was used as a feature extractor, i.e. all the layers of the pre-trained model were frozen, and only the top layer was trained on the training dataset. The proposed model was trained for 10 epochs on the training dataset. The model achieved an accuracy of 99.8883% on the testing dataset. The classification report and confusion matrix of model 2 on testing it on testing data is shown in Table 5 and Fig. 13 respectively. The accuracy vs epochs graph and the loss vs epochs graph of model1 for the Kylberg dataset while training is shown in Fig. 14.

Table 5 Classification report for model 2 Kylberg dataset
Fig. 13
figure 13

Model 2 confusion matrix for the Kylberg dataset

Fig. 14
figure 14

Model 2 accuracy and losses graph for the Kylberg datset

4.4.2 Brodatz dataset

The second dataset to be studied was the Brodatz dataset. The first model is developed using the MobileNetV3 small model, trained on the ImageNet dataset. The top layer of the pre-trained model is removed and replaced by a softmax layer with 112 classes. The pre-trained model was used as a feature extractor, i.e. all the layers of the pre-trained model were frozen, and only the top layer was trained on the training dataset. The proposed model was trained for 7 epochs on the training dataset. The model achieved an accuracy of 99.6651% on the testing dataset. The classification report of model 1 on testing it on the testing data is shown in Table 6. The accuracy vs epochs graph and the loss vs epochs graph of model1 for the Brodtz dataset while training is shown in Fig. 15.

Table 6 Classification report for model 1 Brodatz dataset
Fig. 15
figure 15

Model 1 accuracy and losses graph for the Brodatz datset

The second model is developed using the InceptionV3 model trained on the ImageNet dataset. The top layer of the pre-trained model is removed and replaced by a softmax layer with 112 classes. The pre-trained model was used as a feature extractor, i.e. all the layers of the pre-trained model were frozen, and only the top layer was trained on the training dataset. The proposed model was trained for 7 epochs on the training dataset. The model achieved an accuracy of 99.8884% on the testing dataset. The classification report of model 2 on testing it on the testing data is shown in Table 7. The accuracy vs epochs graph and the loss vs epochs graph of model1 for the Brodatz dataset while training is shown in Fig. 16.

Table 7 Classification report for model 2 Brodatz dataset
Fig. 16
figure 16

Model 2 accuracy and losses graph for the Brodatz datset

4.4.3 Outex dataset

The third dataset to be studied was the Outex dataset. The first model is developed using the MobileNetV3 small model, trained on the ImageNet dataset. The top layer of the pre-trained model is removed and replaced by a softmax layer with 112 classes. The pre-trained model was used as a feature extractor, i.e. all the layers of the pre-trained model were frozen, and only the top layer was trained on the training dataset. The proposed model was trained for 5 epochs on the training dataset. The model achieved an accuracy of 99.479% on the testing dataset. The classification report and confusion matrix of model 1 on testing it on testing data is shown in Table 8 and Fig. 17 respectively. The accuracy vs epochs graph and the loss vs epochs graph of model1 for the Outex dataset while training is shown in Fig. 18.

Table 8 Classification report for model 1 Outex dataset
Fig. 17
figure 17

Model 1 confusion matrix for the Outex datset

Fig. 18
figure 18

Model 1 accuracy and losses graph for the Outex datset

The second model is developed using the InceptionV3 model trained on the ImageNet dataset. The top layer of the pre-trained model is removed and replaced by a softmax layer with 112 classes. The pre-trained model was used as a feature extractor, i.e. all the layers of the pre-trained model were frozen, and only the top layer was trained on the training dataset. The proposed model was trained for 5 epochs on the training dataset. The model achieved an accuracy of 99.479% on the testing dataset. The classification report and confusion matrix of model 2 on testing it on testing data are shown in Table 9 and Fig. 19 respectively. The accuracy vs epochs graph and the loss vs epochs graph of model1 for the Outex dataset while training is shown in Fig. 20.

Table 9 Classification report for model 2 Outex dataset
Fig. 19
figure 19

Model 2 confusion matrix for the Outex datset

Fig. 20
figure 20

Model 2 accuracy and losses graph for the Outex datset

4.5 Comparative study

The results of the 2 proposed models are compared with other recently proposed models. Table 10 shows the comparison between the two proposed models and other recently applied models on the Kylberg dataset. Table 11 shows the comparison between the two proposed models and other recently applied models on the Brodatz dataset. Table 12 shows the comparison between the two proposed models and other recently applied models on the Outex dataset.

Table 10 Performance comparison of our models with the existing techniques for the Kylberg dataset
Table 11 Performance comparison of our models with the existing techniques for the Brodatz dataset
Table 12 Performance comparison of our models with the existing techniques for the Outex TC-00012 dataset

4.6 Discussion

In this experiment, the pre-trained models used are trained on the ImageNet dataset and openly available for use. The models were trained and tested using two cases. In the first case, the pre-trained model used a feature extractor, and only the last layer was trained on the dataset. The whole model was trained on the training dataset in the second case. The feature extraction case yielded better results and lesser training time in most cases. As mentioned in Section 3.2, all the images were rescaled to a size of 224*224*3 to make them compatible with the pre-trained models. The datasets were then split in a ratio of 80:20 for training and testing data. Tables 1011, and 12 in Section 4.5 showcase the comparison of the results of our method with the previously proposed methods. From the tables, it is evident that our methods have outperformed the previously proposed methods.

5 Conclusion and future Scope

Texture classification is an essential area of research that has attracted many researchers to propose different models. From the comparative study, it can be concluded that our models give better results than most of the existing models for the Kylberg and Outex datasets. Both models got a testing accuracy of 100 on the Kylberg and Outex datasets. Our models gave competitive results for the Brodatz dataset too. Despite using the models as only feature extractors (except for MobileNetV3 on the Kylberg dataset), the models have attained outstanding results. It means that the datasets in the study and the ImageNet dataset have very similar feature space. Hence, it can be concluded that transfer learning can be used to quickly solve tasks where the feature space of the target dataset is similar to the feature space of the dataset on which the pre-trained model is trained.

In future, we would like to test our models on more texture datasets and even use them for other domains like medical and aerial imagery. It is evident that the similarity of feature space of the source and target dataset has a massive impact on the model performance. This study used models which were trained on the ImageNet datasets. The authors also aim to extend this work to transformer based architectures. We would also like to expand the study by using the same model architectures trained on a different dataset. Using different source models for standard architectures and different target models can help understand transfer learning deeper.