Log in

ParaLkResNet: an efficient multi-scale image classification network

  • Research
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Recently, deep neural networks have achieved remarkable results in computer vision tasks with the widely used visual attention mechanism. However, the introduction of the visual attention mechanism increases the parameters and computational complexity, which limit its application in resource-constrained environments. To solve this problem, we propose a novel convolutional block, the ParaLk block (PLB), a large kernel parallel convolutional block. Additionally, we apply PLB to PreActResNet by replacing the first 2D convolution to capture feature maps at different scales and call this new network ParaLkResNet. In practice, the effective receptive field of a convolutional network is smaller than that in real-world computation. Therefore the PLB is used to increase the receptive field of the network. Besides extracting multi-scale and high fusion features over normal 2D convolution, it has low latency in typical downstream tasks and good scalability to different data. It is worth noting that PLB as a plug-in block can apply to various computer vision tasks not limited to image classification. The proposed method outperforms most current classification networks in image classification. The accuracy on the CIFAR-10 dataset is improved by 2.42% and 0.66% compared to OTTT and IM-Loss, respectively. Our source code is available at: https://doi.org/10.5281/zenodo.11204902.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data and code availability

The CIFAR dataset supporting the results of this study is publicly available at https://www.cs.toronto.edu/~kriz/cifar.html, reference number [44]. The CINIC-10 dataset is openly available in [cinic-10] at http://dx.doi.org/10.7488/ds/2448, reference number [45].

References

  1. Dai, L., et al.: A deep learning system for detecting diabetic retinopathy across the disease spectrum. Nat. Commun. 12(1), 3242 (2021)

    Article  Google Scholar 

  2. Dai, L., et al.: A deep learning system for predicting time to progression of diabetic retinopathy. Nat. Med. 1–11 (2024)

  3. Nazir, A., et al.: OFF-eNET: an optimally fused fully end-to-end network for automatic dense volumetric 3D intracranial blood vessels segmentation. IEEE Trans. Image Process. 29, 7192–7202 (2020)

    Article  Google Scholar 

  4. Qin, Y., et al.: UrbanEvolver: function-aware urban layout regeneration. Int. J. Comput. Vis. 1–20 (2024)

  5. Cheng, Z., et al.: Deep colorization. In: Proceedings of the IEEE International Conference on Computer Vision (2015)

  6. Zhang, B., et al.: Depth of field rendering using multilayer-neighborhood optimization. IEEE Trans. Vis. Comput. Gr. 26(8), 2546–2559 (2019)

    Article  Google Scholar 

  7. Jiang, N., et al.: Photohelper: portrait photographing guidance via deep feature retrieval and fusion. IEEE Trans. Multimed. (2022)

  8. Sheng, B., et al.: Improving video temporal consistency via broad learning system. IEEE Trans. Cybern. 52(7), 6662–6675 (2021)

    Article  Google Scholar 

  9. Qian, B., et al.: DRAC 2022: a public benchmark for diabetic retinopathy analysis on ultra-wide optical coherence tomography angiography images. Patterns (2024)

  10. Sheng, B., et al.: Intrinsic image decomposition with step and drift shading separation. IEEE Trans. Vis. Comput. Gr. 26(2), 1332–1346 (2018)

    Article  Google Scholar 

  11. Chen, Z., et al.: Outdoor shadow estimating using multiclass geometric decomposition based on BLS. IEEE Trans. Cybern. 50(5), 2152–2165 (2018)

    Article  Google Scholar 

  12. Wei, Y., et al.: SurroundOcc: multi-camera 3D occupancy prediction for autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)

  13. Li, J., et al.: Automatic detection and classification system of domestic waste via multimodel cascaded convolutional neural network. IEEE Trans. Ind. Inform. 18(1), 163–173 (2021)

    Article  Google Scholar 

  14. Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning. PMLR (2019)

  15. He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

  16. Howard, A. G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. Preprint at ar**v:1704.04861 (2017)

  17. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Preprint at ar**v:1409.1556 (2014)

  18. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)

  19. Krizhevsky, A., et al.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25 (2012)

  20. **e, S., et al.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)

  21. Zhao, H., et al.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)

  22. Dosovitskiy, A., et al.: An image is worth \(16\times 16\) words: transformers for image recognition at scale. Preprint at ar**v:2010.11929 (2020)

  23. Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning. PMLR (2015)

  24. Hu, J., et al. Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

  25. Jia, X., et al.: U-net vs transformer: Is u-net outdated in medical image registration? In: International Workshop on Machine Learning in Medical Imaging. Springer (2022)

  26. Chen, Z., et al.: MNGNAS: distilling adaptive combination of multiple searched networks for one-shot neural architecture search. IEEE Trans. Pattern Anal. Mach. Intell. (2023)

  27. Howard, A., et al.: Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019)

  28. Guo, H., et al.: Multiview high dynamic range image synthesis using fuzzy broad learning system. IEEE Trans. Cybern. 51(5), 2735–2747 (2019)

    Article  Google Scholar 

  29. Chen, T., et al.: " BNN-BN=?": training binary neural networks without batch normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)

  30. Guo, Y., et al.: IM-loss: information maximization loss for spiking neural networks. Adv. Neural Inf. Process. Syst. 35, 156–166 (2022)

    Google Scholar 

  31. Arora, S., et al.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: International Conference on Machine Learning. PMLR (2018)

  32. Guo, S., et al.: ExpandNet: training compact networks by linear expansion. Preprint: v3 (1811)

  33. Trockman, A., Kolter, J.Z.: Patches are all you need?" Preprint at ar**v:2201.09792 (2022)

  34. Zhang, H., et al.: EPSANet: an efficient pyramid squeeze attention block on convolutional neural network. In: Proceedings of the Asian Conference on Computer Vision (2022)

  35. Han, K., et al.: Transformer in transformer. Adv. Neural Inf. Process. Syst. 34, 15908–15919 (2021)

    Google Scholar 

  36. **e, Z., et al.: BaGFN: broad attentive graph fusion network for high-order feature interactions. IEEE Trans. Neural Netw. Learn. Syst. 34(8), 4499–4513 (2021)

    Article  Google Scholar 

  37. Lin, X., et al.: EAPT: efficient attention pyramid transformer for image processing. IEEE Trans. Multimed. 25, 50–61 (2021)

    Article  Google Scholar 

  38. Huo, X., et al.: HiFuse: hierarchical multi-scale feature fusion network for medical image classification (preprint) (2022)

  39. Araujo, A., et al.: Computing receptive fields of convolutional neural networks. Distill 4(11), e21 (2019)

    Article  Google Scholar 

  40. Ding, X., et al.: RepVGG: making VGG-style convnets great again. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)

  41. Ding, X., et al.: Scaling up your kernels to \(31\times 31\): revisiting large kernel design in CNNs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)

  42. Cao, J., et al.: Do-conv: Depthwise over-parameterized convolutional layer. IEEE Trans. Image Process. 31, 3726–3736 (2022)

    Article  Google Scholar 

  43. Zheng, Y., et al.: Regularizing neural networks via adversarial model perturbation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)

  44. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)

  45. Darlow, L.N., et al.: CINIC-10 is not ImageNet or CIFAR-10. https://doi.org/10.48550/ar**v.1810.03505(2018)

  46. Pishchik, E.: Trainable activations for image classification (2023)

  47. Romero, D. W., et al.: Flexconv: continuous kernel convolutions with differentiable kernel sizes. In: Preprint at ar**v:2110.08059 (2021)

  48. Bungert, L., et al.: A Bregman learning framework for sparse neural networks. J. Mach. Learn. Res. 23(1), 8673–8715 (2022)

    MathSciNet  Google Scholar 

  49. Schuler, J.P.S., et al.: Grouped pointwise convolutions reduce parameters in convolutional neural networks. Mendel (2022)

  50. Dwibedi, D., et al.: With a little help from my friends: nearest-neighbor contrastive learning of visual representations. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2021)

  51. **ao, M., et al.: Online training through time for spiking neural networks. Adv. Neural Inf. Process. Syst. 35, 20717–20730 (2022)

    Google Scholar 

  52. Jeevan, P.: Convolutional xformers for vision. Preprint at ar**v:2201.10271 (2022)

  53. Zhu, C., et al.: Gradinit: learning to initialize neural networks for stable and efficient training. Adva. Neural Inf. Process. Syst. 34, 16410–16422 (2021)

    Google Scholar 

  54. Gavrikov, P., Keuper, J.: CNN filter DB: an empirical investigation of trained convolutional filters. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)

  55. Schwarz Schuler, J.P., et al.: An enhanced scheme for reducing the complexity of pointwise convolutions in CNNs for image classification based on interleaved grouped filters without divisibility constraints. Entropy 24(9), 1264 (2022)

  56. Yao, D., et al.: Context-aware compilation of DNN training pipelines across edge and cloud. In: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 5, no. 4, pp. 1–27 (2021)

  57. Sander, M.E., et al.: Momentum residual neural networks. In: International Conference on Machine Learning. PMLR (2021)

  58. Hassani, A., et al.: Esca** the big data paradigm with compact transformers. Preprint at ar**v:2104.05704 (2021)

  59. Moreau, T., et al.: Benchopt: reproducible, efficient and collaborative optimization benchmarks. Adv. Neural Inf. Process. Syst. 35, 25404–25421 (2022)

    Google Scholar 

  60. Chrysos, G.G., et al.: Augmenting deep classifiers with polynomial neural networks. In: European Conference on Computer Vision. Springer (2022)

  61. Kabir, H.D., et al.: SpinalNet: deep neural network with gradual input. IEEE Trans. Artif. Intell. (2022)

  62. Samad, S.A., Gitanjali, J.: SCMA: exploring dual-module attention with multi-scale kernels for effective feature extraction. IEEE Access (2023)

Download references

Funding

This research was funded by National Natural Science Foundation of China (62341603, 62006107) and Introduction and Cultivation Program for Young Innovative Talents of Universities in Shandong Province (2021QCYY003).

Author information

Authors and Affiliations

Authors

Contributions

Y.L. and X.W. contributed to the conception of the study; J.C. performed the experiment; T.Y. completed the analysis of the experimental data; T.Y., Y.L., H.L., and J.C. contributed significantly to analysis and manuscript preparation; H.L. and J.C. prepared all the graphs and tables; T.Y. and Y.L. performed the data analyses and wrote the manuscript; X.W. and J.C. supervised the preparation of the manuscript. Y.L., X.W., and J.C. reviewed and edited the manuscript. All authors reviewed the manuscript. X.W., J.C., and H.L. helped perform the analysis with constructive discussions.

Corresponding authors

Correspondence to Ji Chen or **ng Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, T., Liu, Y., Liu, H. et al. ParaLkResNet: an efficient multi-scale image classification network. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03508-x

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00371-024-03508-x

Keywords

Navigation