1 Introduction

Image Processing (IP) stands as a multifaceted field encompassing a range of methodologies dedicated to gleaning valuable insights from images. Concurrently, the landscape of Artificial Intelligence (AI) has burgeoned into an expansive realm of exploration, serving as the conduit through which intelligent machines strive to replicate human cognitive capacities. Within the expansive domain of AI, Machine Learning (ML) emerges as a pivotal subset, empowering models to autonomously extrapolate outcomes from structured datasets, effectively diminishing the need for explicit human intervention in the decision-making process. At the heart of ML lies Deep Learning (DL), a subset that transcends conventional techniques, particularly in handling unstructured data. DL boasts an unparalleled potential for achieving remarkable accuracy, at times even exceeding human-level performance. This prowess, however, hinges on the availability of copious data to train intricate neural network architectures, characterized by their multilayered composition. Unlike their traditional counterparts, DL models exhibit an innate aptitude for feature extraction, a task that historically posed challenges. This proficiency can be attributed to the architecture's capacity to inherently discern pertinent features, bypassing the need for explicit feature engineering. Rooted in the aspiration to emulate cognitive processes, DL strives to engineer learning algorithms that faithfully mirror the intricacies of the human brain. In this paper, a diverse range of deep learning methodologies, contributed by various researchers, is elucidated within the context of Image Processing (IP) techniques.

This comprehensive compendium delves into the diverse and intricate landscape of Image Processing (IP) techniques, encapsulating the domains of image restoration, enhancement, segmentation, feature extraction, and classification. Each domain serves as a cornerstone in the realm of visual data manipulation, contributing to the refinement, understanding, and utilization of images across a plethora of applications.

Image restoration techniques constitute a critical first step in rectifying image degradation and distortion. These methods, encompassing denoising, deblurring, and inpainting, work tirelessly to reverse the effects of blurring, noise, and other forms of corruption. By restoring clarity and accuracy, these techniques lay the groundwork for subsequent analyses and interpretations, essential in fields like medical imaging, surveillance, and more.

The purview extends to image enhancement, where the focus shifts to elevating image quality through an assortment of adjustments. Techniques that manipulate contrast, brightness, sharpness, and other attributes enhance visual interpretability. This enhancement process, applied across diverse domains, empowers professionals to glean finer details, facilitating informed decision-making and improved analysis.

The exploration further extends to image segmentation, a pivotal process for breaking down images into meaningful regions. Techniques such as clustering and semantic segmentation aid in the discernment of distinct entities within images. The significance of image segmentation is particularly pronounced in applications like object detection, tracking, and scene understanding, where it serves as the backbone of accurate identification and analysis.

Feature extraction emerges as a fundamental aspect of image analysis, entailing the identification of crucial attributes that pave the way for subsequent investigations. While traditional methods often struggle to encapsulate intricate attributes, deep learning techniques excel in autonomously recognizing complex features, contributing to a deeper understanding of images and enhancing subsequent analysis.

Image classification, a quintessential task in the realm of visual data analysis, holds prominence. This process involves assigning labels to images based on their content, playing a pivotal role in areas such as object recognition and medical diagnosis. Both machine learning and deep learning techniques are harnessed to automate the accurate categorization of images, enabling efficient and effective decision-making.

The Sect. 1 elaborates the insights of the image processing operations. In Sect. 2 of this paper, a comprehensive overview of the evaluation metrics employed for various image processing operations is provided. Moving to Sect. 3, an in-depth exploration unfolds concerning the diverse range of Deep Learning (DL) models specifically tailored for image preprocessing tasks. Within Sect. 4, a thorough examination ensues, outlining the array of DL methods harnessed for image segmentation tasks, unraveling their techniques and applications.

Venturing into Sect. 5, a meticulous dissection is conducted, illuminating DL strategies for feature extraction, elucidating their significance and effectiveness. In Sect. 6, the spotlight shifts to DL models designed for the intricate task of image classification, delving into their architecture and performance characteristics. The significance of each models are discussed in Sect. 7. Concluding this comprehensive analysis, Sect. 8 encapsulates the synthesized findings and key takeaways, consolidating the insights gleaned from the study.

The array of papers discussed in this paper collectively present a panorama of DL methodologies spanning various application domains. Notably, these domains encompass medical imagery, satellite imagery, botanical studies involving flower images, as well as fruit images, and even real-time image scenarios. Each domain's unique challenges and intricacies are met with tailored DL approaches, underscoring the adaptability and potency of these methods across diverse real-world contexts.

2 Metrics for image processing operations

Evaluation metrics serve as pivotal tools in the assessment of the efficacy and impact of diverse image processing techniques. These metrics serve the essential purpose of furnishing quantitative measurements that empower researchers and practitioners to undertake an unbiased analysis and facilitate meaningful comparisons among the outcomes yielded by distinct methods. By employing these metrics, the intricate and often subjective realm of image processing can be rendered more objective, leading to informed decisions and advancements in the field.

2.1 Metrics for image preprocessing

2.1.1 Mean squared error (MSE)

The average of the squared differences between predicted and actual values. It penalizes larger errors more heavily.

$$MSE=\left(\frac{1}{M*N}\right)*{\sum }{({Original}_{(i,j)}-{Denoised}_{(i,j)})}^2$$

where, M and N are the dimensions of the image. \({Original}_{(i,j)}\,and\, {Denoised}_{(i,j)}\) are the pixel values at position (i, j) in the original and denoised images respectively.

2.1.2 Peak signal-to-noise ratio (PSNR)

PSNR is commonly used to measure the quality of restored images. It compares the original and restored images by considering the mean squared error between their pixel values.

$$PSNR=10*{{\text{log}}}_{10}(\frac{{MAX}^{2 }}{MSE})$$

where, MAX is the maximum possible pixel value (255 for 8-bit images), MSE is the mean squared error between the original and denoised images.

2.1.3 Structural similarity index (SSIM)

SSIM is applicable to image restoration as well. It assesses the similarity between the original and restored images in terms of luminance, contrast, and structure. Higher SSIM values indicate better restoration quality.

\({SSIM}_{\left(x,y\right)}=\left(2*{\mu }_{x }*{\mu }_{y }+{c}_{1}\right)*(2*{\sigma }_{xy }+{c}_{2})/({\mu }_{x}^{2}+{\mu }_{y}^{2}+{c}_{1})*({\sigma }_{x}^{2}+{\sigma }_{y}^{2}+{c}_{2}\)).where, \({\mu }_{x }and {\mu }_{y}\) are the mean values of the original and denoised images. \({\sigma }_{x}^{2} and {\sigma }_{y}^{2}\) are the variances of the original and denoised images. \({\sigma }_{xy}\) is the covariance between the original and denoised images. \({c}_{1}{ and c}_{2}\) are constants to avoid division by zero.

2.1.4 Mean structural similarity index (MSSIM)

MSSIM extends SSIM to multiple patches of the image and calculates the mean SSIM value over those patches.

$$MSSIM=1/N\sum_{I=1}^{N}SSIM({x}_{i,},{y}_{j})$$

where xi and yi are the patches of the original and enhanced images.

2.1.5 Mean absolute error (MAE)

The average of the absolute differences between predicted and actual values. It provides a more robust measure against outliers.

$$MAE=\left(\frac{1}{n}\right)*\sum |{y}_{actual}-{y}_{predicted}|$$

where n is the number of samples.

2.1.6 NIQE (Naturalness image quality evaluator)

NIQE quantifies the naturalness of an image by measuring the deviation of local statistics from natural images. It calculates the mean of the local differences in luminance and contrast.

2.1.7 FID (Fréchet inception distance)

FID measures the distance between two distributions (real and generated images) using the Fréchet distance between their feature representations calculated by a pre-trained neural network.

2.2 Metrics for image segmentation

2.2.1 Intersection over union (IoU)

IoU measures the overlap between the predicted bounding box and the ground truth bounding box. Commonly used to evaluate object detection models.

$$IoU=\frac{Segmented\, Image\,\cup\,Ground\,Truth\,Image}{Segmented\, Image\,\cap\,Ground\,Truth\,Image}$$

2.2.2 Average precision (AP)

AP measures the precision at different recall levels and computes the area under the precision-recall curve. Used to assess object detection and instance segmentation models.

2.2.3 Dice similarity coefficient

The Dice similarity coefficient is another measure of similarity between the predicted segmentation and ground truth. It considers both false positives and false negatives.

$$Dice=\frac{2*(Segmented\, Image\,\cup\,Ground\,Truth\,Image)}{Area\,of\,predicted \,segmentation+Area\, of\, ground\, truth}$$

The Dice Similarity Coefficient, also known as the Sørensen-Dice coefficient, is a common metric for evaluating the similarity between two sets. In the context of image segmentation, it quantifies the overlap between the predicted segmentation and the ground truth, taking into account both true positives and false positives. DSC ranges from 0 to 1, where higher values indicate better overlap between the predicted and ground truth segmentations. A DSC of 1 corresponds to a perfect match.

2.2.4 Average accuracy (AA)

Average Accuracy measures the overall accuracy of the segmentation by calculating the percentage of correctly classified pixels across all classes.

$$AA=\frac{1}{N}\sum_{i=1}^{N}\frac{{\mathrm{True\, Positives}}_{{\text{i}}}+{\mathrm{True\,Negative}}_{{\text{i}}}}{{\mathrm{Total\,Pixels}}_{{\text{i}}}}$$

where, N is the number of classes. True Positivesi and True Negativesi are the true positives and true negatives for class ii. Total Pixelsi is the total number of pixels in class.

2.3 Metrics for feature extraction and classification

2.3.1 Accuracy

The ratio of correctly predicted instances to the total number of instances. It's commonly used for balanced datasets but can be misleading for imbalanced datasets.

$$Accuracy=\frac{\left(True\, Positives+True\, Negatives\right)}{Total\, Prediction}$$

2.3.2 Precision

The ratio of true positive predictions to the total number of positive predictions. It measures the model’s ability to avoid false positives.

$$Precision=\frac{True\, Positives}{True\, Positives+False \,Positives}$$

2.3.3 Recall (Sensitivity or true positive rate)

The ratio of true positive predictions to the total number of actual positive instances. It measures the model’s ability to correctly identify positive instances.

$$\mathrm{Recall }\left({\text{Sensitivity}}\right)=\frac{\mathrm{True\,Positives}}{\mathrm{True\,Positives}+\mathrm{False\,Negatives}}$$

2.3.4 F1-Score

The harmonic mean of precision and recall. It provides a balanced measure between precision and recall.

$$F{1}_{score}=2*\frac{Precision*Recall}{Precision+Recall}$$

2.3.5 Specificity (True negative rate)

The ratio of true negative predictions to the total number of actual negative instances.

$$Specificity=\frac{True\,Negatives}{True\,Negative+False\,Positives}$$

2.3.6 ROC curve (Receiver operating characteristic curve)

A graphical representation of the trade-off between true positive rate and false positive rate as the classification threshold varies. These metrics are commonly used in binary classification. The ROC curve plots this trade-off, and AUC summarizes the curve's performance.

3 Image preprocessing

Image preprocessing is a fundamental step in the field of image processing that involves a series of operations aimed at preparing raw or unprocessed images for further analysis, interpretation, or manipulation. This crucial phase helps enhance the quality of images, mitigate noise, correct anomalies, and extract relevant information, ultimately leading to more accurate and reliable results in subsequent tasks such as image analysis, recognition, and classification.

Image preprocessing is broadly categorized into image restoration which removes the noises and blurring in the images and image enhancement which improves the contrast, brightness and details of the images.

3.1 Image restoration

Image restoration serves as a pivotal process aimed at reclaiming the integrity and visual quality of images that have undergone degradation or distortion. Its objective is to transform a degraded image into a cleaner, more accurate representation, thereby revealing concealed details that may have been obscured. This process is particularly vital in scenarios where images have been compromised due to factors like digital image acquisition issues or post-processing procedures such as compression and transmission. By rectifying these issues, image restoration contributes to enhancing the interpretability and utility of visual data.

A notable adversary in the pursuit of pristine images is noise, an unintended variation in pixel values that introduces unwanted artifacts and can lead to the loss of important information. Different types of noise, such as Gaussian noise characterized by its random distribution, salt and pepper noise causing sporadic bright and dark pixels, and speckle noise resulting from interference, can mar the quality of images. These disturbances often originate from the acquisition process or subsequent manipulations of the image data.

Historically, traditional image restoration techniques have included an array of methods to mitigate the effects of degradation and noise. These techniques encompass constrained least square filters, blind deconvolution methods that aim to reverse the blurring effects, Weiner and inverse filters for enhancing signal-to-noise ratios, as well as Adaptive Mean, Order Static, and Alpha-trimmed mean filters that tailor filtering strategies based on the local pixel distribution. Additionally, algorithms dedicated to deblurring counteract motion or optical-induced blurriness, restoring sharpness. Denoising techniques (Tian et al.

References

Download references

Author information

Authors and Affiliations

Authors

Contributions

All authors reviewed the manuscript.

Corresponding author

Correspondence to P. S. Eliahim Jeevaraj.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Archana, R., Jeevaraj, P.S.E. Deep learning models for digital image processing: a review. Artif Intell Rev 57, 11 (2024). https://doi.org/10.1007/s10462-023-10631-z

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10462-023-10631-z

Keywords