Log in

YOLO-MTG: a lightweight YOLO model for multi-target garbage detection

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

With wide adoption of deep learning technology in AI, intelligent garbage detection has become a hot research topic. However, existing datasets currently used for garbage detection rarely involves multi-category and multi-target garbage that are densely accumulated in actual garbage detection scenarios. In addition, many existing garbage detection models have such problems as low detection efficiency and difficulties in integration with resource-constrained devices. To address the above situations, this study proposes a lightweight YOLO model for multi-target garbage detection (YOLO-MTG). This model is designed as follows: firstly, MobileViTv3, a lightweight hybrid network, serves as the feature extraction network to encode global representations, enhancing the model's ability of discriminating dense targets. Secondly, MobileViT block, the feature extraction unit, is optimized with combination of EfficientFormer and dynamic convolution, aiming to enhance the model's feature extraction capability, focusing on essential feature information and reduce the redundancy in useless information. Finally, feature reuse techniques are deployed to reconstruct Neck to minimize the loss of channel information in the feature transmission process, and maintain the strong feature fusion ability of the model. The experimental results on the self-built multi-target garbage (MTG) dataset show that YOLO-MTG achieves 95.4% mean average precision (mAP) with only 3.4 M parameters, and it is superior to other state-of-the-art (SOTA) methods. This work contributes new insights into the field of garbage detection, aiming to advance garbage classification for practical engineering applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Kuang, Y., Lin, B.: Public participation and city sustainability: evidence from Urban Garbage Classification in China. Sustain. Cities Soc. 67, 102741 (2021). https://doi.org/10.1016/j.scs.2021.102741

    Article  Google Scholar 

  2. Tong, Y., Liu, J., Liu, S.: China is implementing “Garbage Classification” action. Environ. Pollut. 259, 113707 (2020). https://doi.org/10.1016/j.envpol.2019.113707

    Article  Google Scholar 

  3. Mao, W.-L., Chen, W.-C., Wang, C.-T., Lin, Y.-H.: Recycling waste classification using optimized convolutional neural network. Resour. Conserv. Recycl. 164, 105132 (2021). https://doi.org/10.1016/j.resconrec.2020.105132

    Article  Google Scholar 

  4. Feng, Z., Yang, J., Chen, L., Chen, Z., Li, L.: An intelligent waste-sorting and recycling device based on improved EfficientNet. IJERPH. 19, 15987 (2022). https://doi.org/10.3390/ijerph192315987

    Article  Google Scholar 

  5. Chen, Z., Yang, J., Chen, L., Jiao, H.: Garbage classification system based on improved ShuffleNet v2. Resour. Conserv. Recycl. 178, 106 (2022). https://doi.org/10.1016/j.resconrec.2021.106090

    Article  Google Scholar 

  6. Li, N., Huang, H., Wang, X., Yuan, B., Liu, Y., Xu, S.: Detection of floating garbage on water surface based on PC-Net. Sustainability 14, 11729 (2022). https://doi.org/10.3390/su141811729

    Article  Google Scholar 

  7. Ma, W., Wang, X., Yu, J.: A lightweight feature fusion single shot multibox detector for garbage detection. IEEE Access. 8, 188577–188586 (2020). https://doi.org/10.1109/ACCESS.2020.3031990

    Article  Google Scholar 

  8. Jiang, X., Hu, H., Qin, Y., Hu, Y., Ding, R.: A real-time rural domestic garbage detection algorithm with an improved YOLOv5s network model. Sci. Rep. 12, 16802 (2022). https://doi.org/10.1038/s41598-022-20983-1

    Article  Google Scholar 

  9. Tian, M., Li, X., Kong, S., Wu, L., Yu, J.: A modified YOLOv4 detection method for a vision-based underwater garbage cleaning robot. Front Inform Technol Electron Eng. 23, 1217–1228 (2022). https://doi.org/10.1631/FITEE.2100473

    Article  Google Scholar 

  10. Luo, Q., Lin, Z., Yang, G., Zhao, X.: DEC: a deep-learning based edge-cloud orchestrated system for recyclable garbage detection. Concurr. Comput. Pract. Exper. (2021). https://doi.org/10.1002/cpe.6661

    Article  Google Scholar 

  11. Cheng, X., Hu, F., Song, L., Zhu, J., Ming, Z., Wang, C., Yang, L., Ruan, Y.: A novel recyclable garbage detection system for waste-to-energy based on optimized centernet with feature fusion. J Sign Process Syst. 95, 67–76 (2023). https://doi.org/10.1007/s11265-022-01811-1

    Article  Google Scholar 

  12. Redmon, J., Farhadi, A.: YOLO9000: Better, Faster, Stronger. Presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)

  13. Redmon, J., Farhadi, A.: YOLOv3: An Incremental Improvement, http://arxiv.org/abs/1804.02767 (2018)

  14. Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: Optimal speed and accuracy of object detection. ar**v preprint ar**v:2004.10934 (2020)

  15. Glenn J., YOLOv5 release v6.0. https://github.com/ultralytics/yolov5/tree/v6.0 (2022)

  16. Wadekar, S.N., Chaurasia, A.: Mobilevitv3: Mobile-friendly vision transformer with simple and effective fusion of local, global and input features. ar**v preprint ar**v:2209.15159 (2022)

  17. Li, Y., Yuan, G., Wen, Y., Hu, J., Evangelidis, G., Tulyakov, S., Wang, Y., Ren, J.: Efficientformer: vision transformers at mobilenet speed. Adv. Neural. Inf. Process. Syst. 35, 12934–12949 (2022)

    Google Scholar 

  18. Li, C., Zhou, A., Yao, A.: Omni-dimensional dynamic convolution. ar**v preprint ar**v:2209.07947 (2022)

  19. Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: Ghostnet: more features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1580–1589 (2020)

  20. Chen, C., Guo, Z., Zeng, H., **ong, P., Dong, J.: RepGhost: A Hardware-Efficient Ghost Module via Re-parameterization. ar**v preprint ar**v:2211.06088 (2022)

  21. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S.: An image is worth 16x16 words: Transformers for image recognition at scale. ar**v preprint ar**v:2010.11929. (2020)

  22. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 10012–10022 (2021)

  23. Mehta, S., Rastegari, M.: Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer. ar**v preprint ar**v:2110.02178 (2021)

  24. Mehta, S., Rastegari, M.: Separable self-attention for mobile vision transformers. ar**v preprint ar**v:2206.02680 (2022)

  25. Yu, W., Luo, M., Zhou, P., Si, C., Zhou, Y., Wang, X., Feng, J., Yan, S.: Metaformer is actually what you need for vision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10819–10829 (2022)

  26. Yu, W., Si, C., Zhou, P., Luo, M., Zhou, Y., Feng, J., Yan, S., Wang, X.: Metaformer baselines for vision. ar**v preprint ar**v:2210.13452 (2022)

  27. Yang, B., Bender, G., Le, Q.V., Ngiam, J.: Condconv: Conditionally parameterized convolutions for efficient inference. Adv. Neural Inf. Process. Syst. 32 (2019)

  28. Chen, Y., Dai, X., Liu, M., Chen, D., Yuan, L., Liu, Z.: Dynamic convolution: Attention over convolution kernels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 11030–11039 (2020)

  29. Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical Guidelines for Efficient Cnn Architecture Design. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 116–131 (2018)

  30. Howard, A., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V.: Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 1314–1324 (2019)

  31. Tang, Y., Han, K., Guo, J., Xu, C., Xu, C., Wang, Y.: GhostNetV2: Enhance Cheap Operation with Long-Range Attention. ar**v preprint ar**v:2211.12905 (2022)

  32. Zhang, H., Hu, W., Wang, X.: Parc-net: position aware circular convolution with merits from convnets and transformer. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVI. pp. 613–630. Springer (2022)

  33. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and pattern recognition. pp. 7132–7141 (2018)

  34. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 3–19 (2018)

  35. Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: ECA-Net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on cOmputer vision and Pattern Recognition. pp. 11534–11542 (2020)

  36. Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 13713–13722 (2021)

  37. Li, H., Li, J., Wei, H., Liu, Z., Zhan, Z., Ren, Q.: Slim-neck by GSConv: a better design paradigm of detector architectures for autonomous vehicles. ar**v preprint ar**v:2206.02424 (2022)

  38. Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: Exceeding yolo series in 2021. ar**v preprint ar**v:2107.08430 (2021)

  39. Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., Ke, Z., Li, Q., Cheng, M., Nie, W.: YOLOv6: a single-stage object detection framework for industrial applications. ar**v preprint ar**v:2209.02976 (2022)

  40. Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. ar**v preprint ar**v:2207.02696 (2022)

  41. Fulton, M.S., Hong, J., Sattar, J.: Trash-ICRA19: A Bounding Box Labeled Dataset of Underwater Trash, http://conservancy.umn.edu/handle/11299/214366 (2020)

  42. The PASCAL Visual Object Classes Challenge 2012 (VOC2012), http://host.robots.ox.ac.uk/pascal/VOC/voc2012

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their constructive comments and suggestions, which significantly contributed to improving the manuscript. This research was supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. LY24F020005.

Author information

Authors and Affiliations

Authors

Contributions

Zhongyi **a: Conceptualization, Methodology, Software, Formal analysis, Visualization, Writing – original draft. Houkui Zhou: Project administration, Data curation, Writing – review & editing. Huimin Yu: Data curation, Supervision, Haoji Hu: Investigation, Supervision. Guangqun Zhang: Investigation, Funding acquisition. Junguo Hu: Project administration, Funding acquisition. Tao He: Investigation.

Corresponding author

Correspondence to Houkui Zhou.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

**a, Z., Zhou, H., Yu, H. et al. YOLO-MTG: a lightweight YOLO model for multi-target garbage detection. SIViP (2024). https://doi.org/10.1007/s11760-024-03220-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11760-024-03220-2

Keywords

Navigation