Abstract
Seaside aerial images due to the high number of small object instances, interference from the background, and occlusion caused by crowded personnel. These issues result in low accuracy of this scenario in the field of object detection. By improving the YOLOv7 algorithm, we proposed a YOLOv7-B model. We reconstructed the detection layer to reduce the miss rate of small objects. The Improved Bi-directional Feature Pyramid Network (IBi-FPN) replaced the Pyramid Attention Network (PANet) of YOLOv7, better integrating deep feature information with shallow feature information. Finally, we added Convolutional Block Attention Module (CBAM) to improve the utilization of effective features. Experiments show that the YOLOv7-B model can improve the detection accuracy of small objects at the seaside while reducing the number of parameters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Balena, P., Bonifazi, A., Torre, C.M.: Social value of nature amenities: WTP for the use of public seasides. In: Misra, S. (ed.) Computational Science and Its Applications – ICCSA 2019. ICCSA 2019, vol. 11622, pp. 132–144. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-24305-0_11
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Yim, S., Cho, M., Lee, S.: Object-oriented cutout data augmentation for tiny object detection. In: 2023 International Technical Conference on Circuits/Systems, Computers, and Communications, Jeju, Korea, pp. 1–4 (2023)
Gong, Y.Q., Yu, X.H., Ding, Y., et al.: Effective fusion factor in FPN for tiny object detection. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1159–1167 (2020)
Wang, C.Y., Bochkovskiy, A., et al.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. ar**v:2207.02696v1 (2022)
Li, D.J., Yu, L., **, W., et al.: An improved detection method of human target at sea based on Yolov3. In: 2021 IEEE International Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China, pp. 100–103 (2021)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, pp. 580–587 (2014)
Girshick, R.: Fast R-CNN. ar**v:1504.08083v2 (2015)
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, Waikoloa, pp. 91–99. IEEE (2015)
Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: Computer Vision & Pattern Recognition. IEEE (2016)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, pp. 89–95. IEEE Computer Society Press (2018)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)
Lin, T.Y., Dollar, P., Girshick, R., et al.: Feature pyramid networks for object detection. ar**v:1612.03144v2 (2017)
Liu, S., Qi, L., Qin, H.F., et al.: Path aggregation network for instance segmentation. In: Computer Vision and Pattern Recognition, Piscataway, pp. 8759−8768. IEEE (2018)
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2020)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016)
Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.): ACCV 2018. LNCS, vol. 11365. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20873-8
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Hou, Q.B., Zhou, D.Q., Feng, J.S.: Coordinate attention for efficient mobile network design. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, pp. 13708–13717 (2021)
Woo, S., Park, J., Lee, J.Y., et al.: CBAM: convolutional block attention module. ar**v:1807.06521v2 (2018)
Chen, Z., Lu, H., Tian, S., et al.: Construction of a hierarchical feature enhancement network and its application in fault recognition. IEEE Trans. Ind. Inform. 17(7), 4827–4836 (2020)
Xu, X., Tian, J., Lin, K., et al.: Zero-shot cross-modal retrieval by assembling autoencoder and generative adversarial network. ACM Trans. Multimedia Comput. Commun. Appl. (TOMM) 17(1s), 1–17 (2021)
Wang, G., Xu, X., Shen, F., et al.: Cross-modal dynamic networks for video moment retrieval with text query. IEEE Trans. Multimedia 24, 1221–1232 (2022)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yu, M., Jia, Y. (2024). Improved YOLOv7 Small Object Detection Algorithm for Seaside Aerial Images. In: Lu, H., Cai, J. (eds) Artificial Intelligence and Robotics. ISAIR 2023. Communications in Computer and Information Science, vol 1998. Springer, Singapore. https://doi.org/10.1007/978-981-99-9109-9_46
Download citation
DOI: https://doi.org/10.1007/978-981-99-9109-9_46
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-9108-2
Online ISBN: 978-981-99-9109-9
eBook Packages: Computer ScienceComputer Science (R0)