Reparameterized underwater object detection network improved by cone-rod cell module and WIOU loss

Yang, Xuantao; Liu, Chengzhong; Han, Junying

doi:10.1007/s40747-024-01533-w

Reparameterized underwater object detection network improved by cone-rod cell module and WIOU loss

Original Article
Open access
Published: 08 July 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Complex & Intelligent Systems Aims and scope Submit manuscript

Reparameterized underwater object detection network improved by cone-rod cell module and WIOU loss

Download PDF

Abstract

To overcome the challenges in underwater object detection across diverse marine environments—marked by intricate lighting, small object presence, and camouflage—we propose an innovative solution inspired by the human retina's structure. This approach integrates a cone-rod cell module to counteract complex lighting effects and introduces a reparameterized multiscale module for precise small object feature extraction. Moreover, we employ the Wise Intersection Over Union (WIOU) technique to enhance camouflage detection. Our methodology simulates the human eye's cone and rod cells' brightness and color perception using varying sizes of deep and ordinary convolutional kernels. We further augment the network's learning capability and maintain model lightness through structural reparameterization, incorporating multi-branching and multiscale modules. By substituting the Complete Intersection Over Union (CIOU) with WIOU, we increase penalties for low-quality samples, mitigating the effect of camouflaged information on detection. Our model achieved a MAP_0.75 of 72.5% on the Real-World Underwater Object Detection (RUOD) dataset, surpassing the leading YOLOv8s model by 5.8%. Additionally, the model's FLOPs and parameters amount to only 10.62 M and 4.62B, respectively, which are lower than most benchmark models. The experimental outcomes affirm our design's efficacy in addressing underwater object detection's various disturbances, offering valuable technical insights for related oceanic image processing challenges.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Underwater object detection is crucial for assessing marine biodiversity, including the distribution, quantity, and species of marine life. It plays a key role in monitoring ecological shifts in marine environments and provides essential data for conserving fishery ecosystems [1,2,3]. The efficacy of underwater object detection hinges on the algorithm's adaptability to the multifaceted marine environment, which is distinct from terrestrial and aerial settings due to factors like water quality-induced color shifts, variable light intensity, and other optical disturbances. Challenges such as the prevalence of small-scale object clusters and the detection of marine organisms with complex textures, camouflaged against their surroundings, underscore the importance of develo** robust underwater object detection algorithms for fishery ecosystem surveillance.

Current solutions for underwater object detection in complex lighting conditions are categorized into two approaches. The first leverages traditional digital image processing or deep learning-based image enhancement models to clarify images before detection, yielding impressive results [4, 5]. For instance, Zhang et al. enhanced night-time underwater videos using MSRCP before employing Det NASNet and Cascade R-CNN for precise nocturnal fish detection. Similarly [6], Lu et al. proposed a CNN-based two-stage enhancement method to convert degraded underwater images into clear visuals for subsequent detection tasks [7]. However, these solutions are resource-intensive, requiring computational allocation for both image enhancement and detection models. The second approach bypasses image enhancement, feeding raw images into detection models with modified modules to bolster resistance to optical disturbances. For example, Liu et al. introduced a hybrid attention mechanism within a Deep Residual CNN to iteratively extract features, countering light and shadow effects in marine ecosystems [8]. Despite their complexity, such models lack dedicated modules for diverse light interferences and falter without pre-enhancement.

Addressing the challenge of detecting small target clusters, some strategies involve incorporating multi-scale feature extraction modules or deepening the network to improve overall detection capabilities. Gao et al.'s introduction of an augmented weighted bi-directional feature pyramid network (AWBiFPN) exemplifies this, enhancing the detection of fine-grained features for small objects and achieving high MAP scores across major underwater datasets [9]. However, these enhancements typically increase network size and computational demands, which is problematic for resource-constrained underwater detection applications. Efforts to streamline these expanded networks, such as pruning or novel architectural designs, often struggle to maintain original accuracy levels.

Research on detecting camouflaged underwater objects is scant, lacking a definitive focus. Some studies borrow from conventional camouflage and salient object detection techniques, adjusting network modules for better hidden target identification. Xu et al.'s adversarial learning-based adaptive frame regression network is one such example, outperforming existing models in detecting camouflaged underwater targets [10]. Yet, these adaptations may inadvertently accentuate feature disparities between the camouflaged object's edges and the background, potentially undermining the efficacy of general object detection networks. Addressing camouflage challenges may thus require innovative loss function designs.

This paper introduces a comprehensive approach: initially utilizing deep and ordinary convolutional layers with varying sizes and receptive fields to mimic the light interference countermeasures of cone and rod photoreceptors, followed by employing spatial and channel attention mechanisms for feature weighting. Subsequently, the YOLOv8 backbone's ordinary convolutions are replaced with RepVgg modules, enhancing the network's capability to detect small object clusters through structural reparameterization and feature map fusion. Finally, replacing CIOU with WIOU for regression loss minimizes the impact of adverse gradients on camouflaged features. This method markedly improves overall MAP, as well as MAP scores for small and camouflaged objects, offering a robust solution for underwater detection in complex scenarios, including light interference, small object clustering, and camouflage.

Related work

Analysis of data sets

The dataset utilized in this study, curated by Fu et al. from Dalian University of Technology, represents a comprehensive collection of underwater imagery. This dataset comprises 14,000 high-resolution images and 74,903 labeled instances, segmented into a training and testing set at an 8:2 ratio. The images maintain a minimum resolution of 171 × 262 pixels, with object heights ranging from 1 to 3618 pixels. The labeled categories span a diverse array of marine life, including fish, divers, starfish, coral, turtles, echinuses, holothurians, scallops, cuttlefish, and jellyfish, encompassing 10 distinct categories as referenced in [11]. The distribution of these categories is illustrated in Fig. 1.

When compared to existing Underwater Object Detection (UOD) datasets, the images in this study encompass a broader range of complex marine environments. For instance, variations in water quality across different areas contribute to light refraction and scattering issues, such as fog effects, color biases, and extreme lighting conditions, both low and high. The natural grou** behavior of certain marine species like fish and jellyfish introduces challenges in detecting small object clusters. Furthermore, marine life exhibits a high degree of morphological diversity within classes and inter-class similarities, often compounded by their camouflage abilities. Examples include cuttlefish and certain species of corals and turtles, which possess subtle camouflage traits. These elements collectively escalate the difficulty of accurately detecting objects underwater. Figure 2 visually demonstrates these complexities, showcasing instances of small object clustering, camouflage challenges, and various light interference issues like fog effects, color deviations, and extreme lighting conditions.

Data pre-processing

This study employs two principal data augmentation strategies to enhance model performance: mosaic data augmentation [12] and mixup data augmentation [6 (right) and 7, providing a detailed sketch of the proposed model architecture.

$$\begin{aligned} Channel\_attention( {F_{in} } ) &= \sigma ( W2 * W1 * MAXPool( {F_{in} } )\\ &\quad + W2 * W1 * AVGPool( {F_{in} } ) )\end{aligned} $$

$$ Spatiall\_attention_{{\left( {Fin} \right)}} = \sigma \left( {conv_{7 * 7} \left( {\left[ {MAXPool_{ch} \left( {F_{in} } \right);\;AVGPool_{ch} \left( {F_{in} } \right)} \right]} \right)} \right) $$

Equation 1 W1 is the weight of the first fully-connected layer, W2 is the weight of the second fully-connected layer, and the fully-connected layer weights are shared.

Small object and structural re-parameterization improvement networks

Detecting small underwater objects presents significant challenges due to their diminutive scale, mutual occlusion, or overlap among object groups, making it difficult to capture location details and distinguish boundaries [18, 19]. This paper introduces a structural reparameterization multi-scale feature extraction module, inspired by RepVgg, into the model's Backbone and Neck. This innovation enhances the model's proficiency in learning multi-scale features and extracting diverse semantic information on background, foreground, object edges, and textures. It effectively addresses the clustering issue of small underwater objects while achieving lossless compression of the model.

Distinct from conventional multi-scale feature extraction modules (e.g., SPP [20], Inception [21]), RepVgg's architecture [22] differs during training and inference phases. During training, the structure bifurcates into three branches: a 3 × 3 convolution + BN branch, a 1 × 1 convolution + BN branch, and an identity map** + BN branch. At inference, the module undergoes structural reparameterization, simplifying into a singular 3 × 3 convolutional layer. This reparameterization process involves convolutionalizing and fusing all BN operations with the original convolutional kernel into one operator, converting each channel into a path with only 3 × 3 convolutions. Then, applying the distributive law of convolution operations, the weights and biases in the convolutional kernel are aggregated to form a unified 3 × 3 convolution, enhancing structural efficiency (Fig. 8).

In the benchmark model YOLOv8, feature maps fed into the Neck network derive from the second to fourth C2F blocks, excluding the initial convolutional module and C2F block outputs. However, considering the detection of small object clusters sensitive to spatial information, retaining the shallow feature extraction maps is crucial. These maps are rich in spatial and edge details. Consequently, the output of the first convolutional layer (aligned with the cone-rod cell module's output) and the first C2F block are preserved. These outputs undergo processing via a RepVgg module and a maxpooling module before being concatenated with the detect head part's input feature maps (with dimensions of 80 × 80 and 40 × 40) along the channel dimension (Fig. 9), optimizing the model for small object detection in complex underwater environments.

$${y}_{i}=\frac{{x}_{i}-{u}_{i}}{\sqrt{{\sigma }_{i}^{2}+\varepsilon }}{\gamma }_{i}+{\beta }_{i}$$

$$={x}_{i}\frac{{\gamma }_{i}}{{\sigma }_{i}}+{\beta }_{i}-\frac{{\gamma }_{i}{u}_{i}}{{\sigma }_{i}}$$

$$ W_{i}^{\prime } = \frac{{\gamma_{i} }}{{\sigma_{i} }} $$

$$ B^{\prime}_{i} = \beta_{i} - \frac{{\gamma_{i} u_{i} }}{{\sigma_{i} }} $$

Equation 2 BN layer convolutionalization (${x}_{i}$ the input, ${y}_{i}$ the output after the BN, ${u}_{i}$ and ${\sigma }_{i}$ represent the mean and variance of the ith channel, and γi with βi are the training parameters of the bn layer, the $W^{\prime}_{i}$ and $B^{\prime}_{i}$ represent the weights and biases after convolutionalization)

$$Out=\left(in*{W}_{1}+{B}_{1}\right)+\left(in*{W}_{2}+{B}_{2}\right)+\left(in*{W}_{3}+{B}_{3}\right)$$

$$\qquad\quad =in*\left({W}_{1}+{W}_{2}+{W}_{3}\right)+({B}_{1}+{B}_{2}+{B}_{3})$$

Equation 3 Parallel Convolutional Layer Operator Fusion (in reprsents input feature map, the ${W}_{i}$ represents the weight of the ith convolution, and ${B}_{i}$ represents the bias of the ith convolution).

Camouflage issues with WIOU

Camouflaged object image processing stands as a notable area within computer vision, with considerable research dedicated to the detection and segmentation of camouflaged creatures in natural environments, as evidenced by datasets such as COD10k [23] and NC4k [24]. However, underwater camouflaged object detection remains relatively underexplored, presenting significant challenges for model performance due to the intricacies of object camouflage. This paper aims to enhance the network's ability to detect camouflaged objects by focusing on sample variety and loss function optimization.

The dataset employed comprises camouflaged or weakly camouflaged underwater objects. A critical aspect of successful detection lies in identifying unique textures that predominantly or exclusively characterize a given class. However, during network forward propagation, these distinctive textures can easily blend with common or similar textures and image noise, leading to a dilution or loss of unique texture information. This fusion compromises the accuracy of camouflaged object detection. Addressing this, the paper emphasizes the importance of increasing the representation of unique camouflage features in training to boost detection efficacy.

To this end, the training samples are categorized into three types: high-quality, ordinary, and low-quality samples, as illustrated in Fig. 10. Low-quality samples, characterized by complex environmental noise, present a challenge in object identification and localization due to excessive noise in feature extraction. Accordingly, the loss associated with these samples should be minimized. High-quality samples, with clear and distinct object imaging, are easier to recognize and localize. Despite their clarity, these samples often contain a blend of unique and common textures, suggesting a reduction in computed loss. Ordinary samples, likely to contain textures specific to the camouflaged object, warrant a higher loss weighting. This stratification aims to refine the network's focus on crucial texture details, enhancing the detection of camouflaged objects in underwater settings.

The WIOU metric has been developed in three iterations [25] due to computational resource constraints, making a comprehensive grid search for optimal hyperparameter combinations unfeasible. Consequently, this research employs parameter combinations recommended by the original paper, utilizing WIOUv3 for its advancements over WIOUv1. WIOUv1's loss computation is bifurcated into IOU and RWIOU components, with RWIOU accounting for the ratio of the center points' distance between the labeled and predicted boxes relative to the diagonal length of the smallest encompassing rectangle. This approach accentuates the IOU loss for ordinary samples, while diminishing RWIOU loss for high-quality samples, thereby lessening the emphasis on center distance in cases of substantial overlap between the anchor and object (Fig. 4).

$$ IOU = \frac{{\left( {W_{gt} * H_{gt} + W_{pred} * H_{pred} - W^{\prime} * H^{\prime}} \right)}}{{\left( {W^{\prime} * H^{\prime}} \right)}} $$

$$RWIOU=\text{exp}(\frac{{\left({x}_{pred}-{x}_{gt}\right)}^{2}+{\left({y}_{pred}-{y}_{gt}\right)}^{2}}{({W}^{2}+{H}^{2})})$$

$$WiseIO{U}_{v1} =RWIOU*IOU$$

Equation (4) $WiseIO{U}_{v1} loss$

WIOUv3 introduces a novel parameter, β, representing the outlier degree calculated as the ratio of a sample's IOU to the average IOUs of all samples within a batch. A small β indicates a high-quality sample, warranting a minimal gradient assignment, whereas a large β signifies a low-quality sample, to which a small weight is assigned to mitigate adverse gradients. This system prioritizes samples of average quality for bounding box regression. The gradient assignment strategy dynamically adapts, optimizing loss values when the outlier degree meets a predefined constant, C. The dynamic nature of IOU_mean ensures that WIOUv3's sample quality criteria and gradient assignment strategies remain optimally aligned with current data characteristics, as depicted in Fig. 11, enhancing the model's overall performance in bounding box regression.

$$\beta =\frac{IOU}{IOU\_mean}$$

$$r=\frac{\beta }{{\delta \alpha }^{\beta -\delta }}$$

$$WiseIO{U}_{v3} =r WiseIO{U}_{v1}$$

Equation 5 $WiseIO{U}_{v2} loss$ ($\delta \alpha $ are manually set hyperparameters)

Analysis of experimental results

Experimental environment

The configuration of this experimental machine is shown in the Table 1, and some of the hyperparameters are set as Table 2

Table 1 Experimental machine configuration

Full size table

Table 2 Experimental hyperparameter settings

Full size table

Evaluation index

The evaluation index of this experiment is mainly the AP (AP_0.75) value of each label, and the MAP (MAP_0.75) value by all labels, the IOU threshold is 0.75 in judging the positive and negative samples, and the score_threshold of CONFIDENCE is 0.5, the formula for calculating the MAP and the AP of each category is shown in (6).

$$ P_{{{\text{int}} erp}} \left( r \right) = \mathop {\max \left\{ {P\left( {r^{\prime}} \right)} \right\}}\limits_{{r^{\prime} > r}} $$

$$AP=\sum_{i=1}^{n}\left({r}_{i+1}-{r}_{i}\right){P}_{interp}\left({r}_{i+1}\right)$$

$$MAP=\frac{{\sum }_{i=1}^{K}A{P}_{i}}{K}$$

Equation 6 i represents the current category, k represents a total of k categories, r represents the current recall value, P represents the p–r curve, and Pinterp represents the pr curve after map interpolation.

Parameter and performance of each combination (module)

The analysis of data presented in Table 3 reveals that the model achieves optimal performance with convolutional kernels sized 7 for rod blocks and 3 for cone blocks, maintaining a kernel number ratio of 3:2. This configuration yields a performance enhancement of 3.6% over the baseline model. Such improvement aligns with the physiological makeup of the human retina, wherein the number of cone cells is fewer than rod cells, and cone cells have a smaller receptive field than rod cells. This correlation substantiates the efficacy of the proposed cone-rod cell module's approach to spatial channel separation, underscoring the heuristic design's validity.

Table 3 Performance of different rod module design options

Full size table

Additionally, Table 4 outlines the evolution of network structure parameters, computational demand, and MAP (Mean Average Precision) values throughout the model's design process. Notably, the incorporation of the RepVgg module at two distinct phases resulted in a cumulative MAP increase of 5.3% (2.5% + 2.8%). Subsequent application of reparameterization techniques effectively reduced the increased parameters and computational requirements, achieving lossless compression. When compared to alternative methods such as distillation, pruning, and lightweight modular design, this network architecture demonstrates superior performance enhancements, highlighting its advantages in optimizing model efficiency and effectiveness.

Table 4 Parametric quantities and calculations and performance at each stage of model modification

Full size table

Comparison of full model effects

The models under comparison are lightweight object detection frameworks, detailed in Tables 5 and 6. The enhanced model introduced in this paper, termed CRWYOLO, is evaluated from three distinct perspectives to gauge its effectiveness. Firstly, the overall performance of the model is considered, which includes the global detection capability (Mean Average Precision, MAP) alongside the number of parameters and computational operations. Secondly, the model's effectiveness in detecting small-volume objects is assessed, examining the Average Precision (AP) values for six categories of small-volume labels: echinus, holothurian, fish, scallop, jellyfish, and starfish. Lastly, the model's performance in detecting camouflaged objects is analyzed through the AP values for two categories of camouflaged object labels: cuttlefish and corals.

Table 5 Model calculations and number of parameters

Full size table

Table 6 AP/%

Full size table

This multifaceted evaluation approach allows for a comprehensive assessment of the CRWYOLO model's capabilities, encompassing general detection efficiency, proficiency in recognizing small-volume objects, and accuracy in identifying camouflaged entities.

Global detection results

This study evaluates the viability of an improved lightweight design for underwater object detection, analyzing the model named CRWYOLO. With 4.62 billion parameters, CRWYOLO only surpasses YOLOv8-n, YOLOv5_n(v6.1), and Efficientdet-b0 in parameter count among all compared models. Its computational demand, at 10.2 billion operations, is lower than most control models. Despite its efficiency, CRWYOLO's Mean Average Precision (MAP) significantly exceeds all control models, outdoing the top-performing model, YOLOv8-s, by 5.8%. This demonstrates that strategic modifications at various stages enhance underwater detection capabilities without excessively increasing the network's size or computational burden, thereby maintaining its lightweight design.

Small volume label detection results

The effectiveness of this improved approach is further assessed through the detection of small-sized object clusters across six categories: echinus, holothurian, fish, scallop, jellyfish, and starfish. The proposed detection strategy addresses the challenge of low spatial semantics at multiple scales and enhances the network's learning capacity. This results in improved detection of small-volume objects clustered in underwater environments, with notable improvements in fish, scallop, and jellyfish categories. Specifically, jellyfish detection improved by 1.9 AP values over the control model Detr-resnet50's best score of 66.2. However, enhancements in echinus, starfish, and holothurian categories were less pronounced, with echinus showing only a modest increase to an AP value of 53.1. This indicates the proposed method's effectiveness in tackling small-object clustering issues while also suggesting room for further refinement.

Figure 12 clearly illustrates the differences in detecting small object clusters between CRWYOLO and control models. The figure highlights the superior detection accuracy of CRWYOLO (Model E), particularly in densely clustered fish areas, where it outperforms control models by correctly identifying individual fish with higher probability values. In contrast, control models (A, B, D) inaccurately recognize seaweed as echinus, evidenced by orange detection boxes, showcasing CRWYOLO's enhanced detection capabilities.

Camouflage labeling detection results

To assess the effectiveness of the proposed method in detecting camouflaged objects, now focuses on the Average Precision (AP) values for two specific categories: cuttlefish and corals. The cuttlefish category benefits from its distinctive texture, which is more readily identifiable in marine settings, resulting in impressive detection performance across various control models. Despite this, there remains room for enhancement. The proposed method outperforms the highest AP value reported by the control model for cuttlefish detection, YOLOv8-m (91.3), achieving an increase of 1.9.

Conversely, corals, due to their high similarity to the surrounding marine environment, present a greater challenge, often leading to omissions and false detections. The control models generally exhibit poorer performance in detecting corals. However, the proposed method marks a significant improvement in this category, surpassing the top-performing control model, Centernet-resnet50 (62.2), with an increase of 6.8 in AP value.

These results underscore the utility of incorporating Weighted Intersection Over Union (WIOU) in underwater object detection tasks, particularly for camouflaged objects. The technique enhances the detection by prioritizing the preservation of unique textural information in camouflaged entities, thereby improving the model's overall recognition capabilities.

Although control models A, B, C, and D demonstrated proficiency in identifying cuttlefish within Fig. 13, they faltered in accurately detecting coral labels. In contrast, the model developed in this study (E) effectively distinguishes corals from their environment. This achievement highlights the role of gradient weight attenuation in WIOU, which focuses on balancing the quality of samples during the loss calculation phase, thereby elevating the detection of underwater camouflaged objects.

Other indicators of model performance

In order to prove the credibility of the models and data in this paper, we have also collected some other indicators to measure the performance of deep learning models (these metrics will not be analysed here, but will only be provided to the reader as a supplement data of the experiment),the following image (Fig. 14) show the map, F1, Precision, recall curves of CRWYOLO.

Conclusion

This paper introduces innovative design elements focusing on model architecture and loss function optimization. The first innovation involves an input module inspired by the human retina's optic cone and rod cells. This module is adept at mitigating various types of optical noise prevalent in underwater environments. The second key enhancement is the integration of a structural reparameterization module into the network's Backbone and Neck. This addition significantly bolsters the model's capability to comprehend multi-scale features and image semantics, facilitating lossless compression. As a result, it achieves an effective balance between computational efficiency and the detection performance of small-object clusters.

In addition, the implementation of Weighted Intersection Over Union (WIOU) plays a crucial role in enhancing sample quality. It minimizes noise in lower-quality samples and suppresses common textures in high-quality samples that might obscure the unique textures of camouflaged objects. This strategy is specifically tailored to optimize underwater object detection.

Despite these advancements, areas for further improvement remain. The cone-rod cell module, while effective in reducing light interference, adds to the model's parameter count and computational demands. Future research aims to streamline this module, reducing its resource consumption for practical engineering applications. Additionally, while structural reparameterization has improved multi-scale feature extraction, there's potential for further enhancement in the AP values for certain small object categories such as Echinus, holothurian, fish, scallop, and jellyfish. Upcoming experiments will concentrate on improving the matching of positive and negative samples and refining the candidate frame selection algorithm, aiming to develop a more targeted approach for small object clusters.

Furthermore, following the implementation of structural reparameterization, the model is suitable for deployment on specific hardware. However, its acceleration performance is suboptimal when utilized with the TensorRT deployment framework. It is hypothesized that structural reparameterization might advance the fusion of operators in the deployment framework to the model code level [26, 27]. This hypothesis presents an intriguing avenue for future research, offering not only optimization strategies for underwater object detection but also practical insights into the control of structural reparameterization.

Data availability

The data that support the findings of this study are available from the corresponding author upon request.

References

Li Y, Wang B, Li Y et al (2023) Underwater object tracker: UOSTrack for marine organism gras** of underwater vehicles. Ocean Eng 285:115449
Article Google Scholar
Zhou Y, Zhang R, Liu Y et al (2023) RetinaNet-based marine fish detection algorithm. Adv Lasers Optoelectron 60(10):163–171
Google Scholar
Luo Y, Liu Q, Zhang Y et al (2023) A review of underwater image target detection based on deep learning. J Electron Inform. https://doi.org/10.11999/JEIT221402
Article Google Scholar
Mohamed M (2023) Agricultural sustainability in the age of deep learning: current trends, challenges, and future trajectories". Sustain Mach Intell J 4(2):1–20. https://doi.org/10.61185/SMIJ.2023.44102
Article Google Scholar
Guan Z, Hou C, Zhou S et al (2022) Research on underwater target recognition technology based on neural network. Wireless Commun Mob Comput 2022:1–12
Google Scholar
Yang X, Men G, Liang W et al (2023) Research on the impact of underwater image enhancement and restoration on deep learning target detection accuracy^[J/OL]. Comput Eng. https://doi.org/10.19678/j.issn.1000-3428.0066610
Article Google Scholar
Zhang M, Long T, Song W et al (2021) Night fish detection based on improved Cascade R-CNN and image enhancement. J Agric Mach 52(09):179–185
Google Scholar
Lu S, Guan F, Lai H et al (2023) A two-stage underwater image enhancement method based on convolutional neural network^[J/OL]. J Bei**g Univ Aeronaut Astronaut. https://doi.org/10.13700/j.bh.1001-5965.2022.1003
Article Google Scholar
Liu J, Zhang L, Li Y et al (2023) Deep residual convolutional neural network based on hybrid attention mechanism for ecological monitoring of marine fishery. Eco Inform 77:102204
Article Google Scholar
Gao J, Zhang Y, Geng X et al (2024) Augmented weighted bidirectional feature pyramid network for marine object detection. Expert Syst Appl 237:121688
Article Google Scholar
Xu T, Zhao W, Meng X et al (2023) Adversarial learning-based method for recognition of bionic and highly contextual underwater targets. J Electron Imaging 32(2):023027–023027
Article Google Scholar
Fu C, Liu R, Fan X et al (2023) Rethinking general underwater object detection: datasets, challenges, and solutions. Neurocomputing 517:243–256
Article Google Scholar
Bochkovskiy A, Wang C, Liao H (2020) Yolov4: Optimal speed and accuracy of object detection. ar**v preprint ar**v:2004.10934
Zhang H, Cisse M, Dauphin Y, et al (2018) mixup: beyond empirical risk minimization^[C] International Conference on Learning Representations
Wang CY, Liao HY, Wu YH, et al. (2020) CSPNet: a new backbone that can enhance learning capability of CNN^[C] //Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 390–391
Li X, Wang W, L, et al (2020) Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection. Adv Neural Inform Process Syst 33:21002–21012
Google Scholar
Young B, Ramakrishnan C, Ganjawala T et al (2021) An uncommon neuronal class conveys visual signals from rods and cones to retinal ganglion cells. Proc Natl Acad Sci 118(44):e2104884118
Article Google Scholar
Govardovskii V, Rotov A, Astakhova L et al (2020) Visual cells and visual pigments of the river lamprey revisited. J Comp Physiol A 206:71–84
Article Google Scholar
Xu Y, Zhou Y, Ye Q, et al. Suspended impurity occlusion removal method for underwater structural state observation^[J/OL]. J Opt 1–25. http://kns.cnki.net/kcms/detail/31.1252.O4.20230803.1002.044.html.
Chen L, Yang Y, Zhang J et al (2023) An underwater occlusion target detection algorithm based on feature enhancement and loss optimization. J Detect Control 45(03):109–115
Google Scholar
He K, Zhang X, Ren S et al (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Article Google Scholar
Szegedy C, Liu W, Jia Y, et al. (2015) Going deeper with convolutions^[C] //Proceedings of the IEEE conference on computer vision and pattern recognition. 1–9
Ding X, Zhang X, Ma N, et al. (2021) Repvgg: Making vgg-style convnets great again^[C] //Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 13733–13742.
Fan D, Ji G, Sun G, et al. (2020) Camouflaged object detection^[C] //Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2777 -2787.
Lv Y, Zhang J, Dai Y, et al. (2021) Simultaneously localize, segment and rank the camouflaged objects^[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11591–11601.
Tong Z, Chen Y, Xu Z, et al. (2023) WIoU: bounding box regression loss with dynamic focusing mechanism. ar**v preprint ar**v:2301.10051
Jeong E, Kim J, Ha S (2022) Tensorrt-based framework and optimization methodology for deep learning inference on jetson boards. ACM Trans Embed Comput Syst (TECS) 21(5):1–26
Article Google Scholar
Jeong E, Kim J, Tan S et al (2021) Deep learning inference parallelization on heterogeneous processors with tensorrt^[J]. IEEE Embed Syst Lett 14(1):15–18
Article Google Scholar

Download references

Acknowledgements

This research was substantially supported by the Scientific Research grant project funded by Nature Science Foundation of China(32360437)

Author information

Authors and Affiliations

Gansu Agricultural University, Yingmen Village No. 1, Anning District, Lanzhou, Gansu Province, China
Xuantao Yang, Chengzhong Liu & Junying Han

Authors

Xuantao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Chengzhong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Junying Han
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junying Han.

Ethics declarations

Conflict of interest

All the authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yang, X., Liu, C. & Han, J. Reparameterized underwater object detection network improved by cone-rod cell module and WIOU loss. Complex Intell. Syst. (2024). https://doi.org/10.1007/s40747-024-01533-w

Download citation

Received: 13 October 2023
Accepted: 16 June 2024
Published: 08 July 2024
DOI: https://doi.org/10.1007/s40747-024-01533-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Reparameterized underwater object detection network improved by cone-rod cell module and WIOU loss

Abstract

Introduction

Related work

Analysis of data sets

Data pre-processing

Small object and structural re-parameterization improvement networks

Camouflage issues with WIOU

Analysis of experimental results

Experimental environment

Evaluation index

Parameter and performance of each combination (module)

Comparison of full model effects

Global detection results

Small volume label detection results

Camouflage labeling detection results

Other indicators of model performance

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation