ParaLkResNet: an efficient multi-scale image classification network

Yu, Tongshuai; Liu, Ye; Liu, Hao; Chen, Ji; Wang, **ng

doi:10.1007/s00371-024-03508-x

ParaLkResNet: an efficient multi-scale image classification network

Research
Published: 13 June 2024

(2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Tongshuai Yu¹^na1,
Ye Liu¹^na1,
Hao Liu¹,
Ji Chen¹ &
…
**ng Wang¹

57 Accesses
Explore all metrics

Abstract

Recently, deep neural networks have achieved remarkable results in computer vision tasks with the widely used visual attention mechanism. However, the introduction of the visual attention mechanism increases the parameters and computational complexity, which limit its application in resource-constrained environments. To solve this problem, we propose a novel convolutional block, the ParaLk block (PLB), a large kernel parallel convolutional block. Additionally, we apply PLB to PreActResNet by replacing the first 2D convolution to capture feature maps at different scales and call this new network ParaLkResNet. In practice, the effective receptive field of a convolutional network is smaller than that in real-world computation. Therefore the PLB is used to increase the receptive field of the network. Besides extracting multi-scale and high fusion features over normal 2D convolution, it has low latency in typical downstream tasks and good scalability to different data. It is worth noting that PLB as a plug-in block can apply to various computer vision tasks not limited to image classification. The proposed method outperforms most current classification networks in image classification. The accuracy on the CIFAR-10 dataset is improved by 2.42% and 0.66% compared to OTTT and IM-Loss, respectively. Our source code is available at: https://doi.org/10.5281/zenodo.11204902.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

Data and code availability

The CIFAR dataset supporting the results of this study is publicly available at https://www.cs.toronto.edu/~kriz/cifar.html, reference number [44]. The CINIC-10 dataset is openly available in [cinic-10] at http://dx.doi.org/10.7488/ds/2448, reference number [45].

References

Dai, L., et al.: A deep learning system for detecting diabetic retinopathy across the disease spectrum. Nat. Commun. 12(1), 3242 (2021)
Article Google Scholar
Dai, L., et al.: A deep learning system for predicting time to progression of diabetic retinopathy. Nat. Med. 1–11 (2024)
Nazir, A., et al.: OFF-eNET: an optimally fused fully end-to-end network for automatic dense volumetric 3D intracranial blood vessels segmentation. IEEE Trans. Image Process. 29, 7192–7202 (2020)
Article Google Scholar
Qin, Y., et al.: UrbanEvolver: function-aware urban layout regeneration. Int. J. Comput. Vis. 1–20 (2024)
Cheng, Z., et al.: Deep colorization. In: Proceedings of the IEEE International Conference on Computer Vision (2015)
Zhang, B., et al.: Depth of field rendering using multilayer-neighborhood optimization. IEEE Trans. Vis. Comput. Gr. 26(8), 2546–2559 (2019)
Article Google Scholar
Jiang, N., et al.: Photohelper: portrait photographing guidance via deep feature retrieval and fusion. IEEE Trans. Multimed. (2022)
Sheng, B., et al.: Improving video temporal consistency via broad learning system. IEEE Trans. Cybern. 52(7), 6662–6675 (2021)
Article Google Scholar
Qian, B., et al.: DRAC 2022: a public benchmark for diabetic retinopathy analysis on ultra-wide optical coherence tomography angiography images. Patterns (2024)
Sheng, B., et al.: Intrinsic image decomposition with step and drift shading separation. IEEE Trans. Vis. Comput. Gr. 26(2), 1332–1346 (2018)
Article Google Scholar
Chen, Z., et al.: Outdoor shadow estimating using multiclass geometric decomposition based on BLS. IEEE Trans. Cybern. 50(5), 2152–2165 (2018)
Article Google Scholar
Wei, Y., et al.: SurroundOcc: multi-camera 3D occupancy prediction for autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)
Li, J., et al.: Automatic detection and classification system of domestic waste via multimodel cascaded convolutional neural network. IEEE Trans. Ind. Inform. 18(1), 163–173 (2021)
Article Google Scholar
Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning. PMLR (2019)
He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Howard, A. G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. Preprint at ar**v:1704.04861 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Preprint at ar**v:1409.1556 (2014)
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Krizhevsky, A., et al.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25 (2012)
**e, S., et al.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Zhao, H., et al.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Dosovitskiy, A., et al.: An image is worth \(16\times 16\) words: transformers for image recognition at scale. Preprint at ar**v:2010.11929 (2020)
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning. PMLR (2015)
Hu, J., et al. Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Jia, X., et al.: U-net vs transformer: Is u-net outdated in medical image registration? In: International Workshop on Machine Learning in Medical Imaging. Springer (2022)
Chen, Z., et al.: MNGNAS: distilling adaptive combination of multiple searched networks for one-shot neural architecture search. IEEE Trans. Pattern Anal. Mach. Intell. (2023)
Howard, A., et al.: Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019)
Guo, H., et al.: Multiview high dynamic range image synthesis using fuzzy broad learning system. IEEE Trans. Cybern. 51(5), 2735–2747 (2019)
Article Google Scholar
Chen, T., et al.: " BNN-BN=?": training binary neural networks without batch normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)
Guo, Y., et al.: IM-loss: information maximization loss for spiking neural networks. Adv. Neural Inf. Process. Syst. 35, 156–166 (2022)
Google Scholar
Arora, S., et al.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: International Conference on Machine Learning. PMLR (2018)
Guo, S., et al.: ExpandNet: training compact networks by linear expansion. Preprint: v3 (1811)
Trockman, A., Kolter, J.Z.: Patches are all you need?" Preprint at ar**v:2201.09792 (2022)
Zhang, H., et al.: EPSANet: an efficient pyramid squeeze attention block on convolutional neural network. In: Proceedings of the Asian Conference on Computer Vision (2022)
Han, K., et al.: Transformer in transformer. Adv. Neural Inf. Process. Syst. 34, 15908–15919 (2021)
Google Scholar
**e, Z., et al.: BaGFN: broad attentive graph fusion network for high-order feature interactions. IEEE Trans. Neural Netw. Learn. Syst. 34(8), 4499–4513 (2021)
Article Google Scholar
Lin, X., et al.: EAPT: efficient attention pyramid transformer for image processing. IEEE Trans. Multimed. 25, 50–61 (2021)
Article Google Scholar
Huo, X., et al.: HiFuse: hierarchical multi-scale feature fusion network for medical image classification (preprint) (2022)
Araujo, A., et al.: Computing receptive fields of convolutional neural networks. Distill 4(11), e21 (2019)
Article Google Scholar
Ding, X., et al.: RepVGG: making VGG-style convnets great again. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)
Ding, X., et al.: Scaling up your kernels to \(31\times 31\): revisiting large kernel design in CNNs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)
Cao, J., et al.: Do-conv: Depthwise over-parameterized convolutional layer. IEEE Trans. Image Process. 31, 3726–3736 (2022)
Article Google Scholar
Zheng, Y., et al.: Regularizing neural networks via adversarial model perturbation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)
Darlow, L.N., et al.: CINIC-10 is not ImageNet or CIFAR-10. https://doi.org/10.48550/ar**v.1810.03505(2018)
Pishchik, E.: Trainable activations for image classification (2023)
Romero, D. W., et al.: Flexconv: continuous kernel convolutions with differentiable kernel sizes. In: Preprint at ar**v:2110.08059 (2021)
Bungert, L., et al.: A Bregman learning framework for sparse neural networks. J. Mach. Learn. Res. 23(1), 8673–8715 (2022)
MathSciNet Google Scholar
Schuler, J.P.S., et al.: Grouped pointwise convolutions reduce parameters in convolutional neural networks. Mendel (2022)
Dwibedi, D., et al.: With a little help from my friends: nearest-neighbor contrastive learning of visual representations. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2021)
**ao, M., et al.: Online training through time for spiking neural networks. Adv. Neural Inf. Process. Syst. 35, 20717–20730 (2022)
Google Scholar
Jeevan, P.: Convolutional xformers for vision. Preprint at ar**v:2201.10271 (2022)
Zhu, C., et al.: Gradinit: learning to initialize neural networks for stable and efficient training. Adva. Neural Inf. Process. Syst. 34, 16410–16422 (2021)
Google Scholar
Gavrikov, P., Keuper, J.: CNN filter DB: an empirical investigation of trained convolutional filters. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)
Schwarz Schuler, J.P., et al.: An enhanced scheme for reducing the complexity of pointwise convolutions in CNNs for image classification based on interleaved grouped filters without divisibility constraints. Entropy 24(9), 1264 (2022)
Yao, D., et al.: Context-aware compilation of DNN training pipelines across edge and cloud. In: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 5, no. 4, pp. 1–27 (2021)
Sander, M.E., et al.: Momentum residual neural networks. In: International Conference on Machine Learning. PMLR (2021)
Hassani, A., et al.: Esca** the big data paradigm with compact transformers. Preprint at ar**v:2104.05704 (2021)
Moreau, T., et al.: Benchopt: reproducible, efficient and collaborative optimization benchmarks. Adv. Neural Inf. Process. Syst. 35, 25404–25421 (2022)
Google Scholar
Chrysos, G.G., et al.: Augmenting deep classifiers with polynomial neural networks. In: European Conference on Computer Vision. Springer (2022)
Kabir, H.D., et al.: SpinalNet: deep neural network with gradual input. IEEE Trans. Artif. Intell. (2022)
Samad, S.A., Gitanjali, J.: SCMA: exploring dual-module attention with multi-scale kernels for effective feature extraction. IEEE Access (2023)

Download references

Funding

This research was funded by National Natural Science Foundation of China (62341603, 62006107) and Introduction and Cultivation Program for Young Innovative Talents of Universities in Shandong Province (2021QCYY003).

Author information

Tongshuai Yu and Ye Liu have contributed equally to this work.

Authors and Affiliations

College of Information Science and Engineering, Linyi University, Linyi, China
Tongshuai Yu, Ye Liu, Hao Liu, Ji Chen & **ng Wang

Authors

Tongshuai Yu
View author publications
You can also search for this author in PubMed Google Scholar
Ye Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ji Chen
View author publications
You can also search for this author in PubMed Google Scholar
**ng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.L. and X.W. contributed to the conception of the study; J.C. performed the experiment; T.Y. completed the analysis of the experimental data; T.Y., Y.L., H.L., and J.C. contributed significantly to analysis and manuscript preparation; H.L. and J.C. prepared all the graphs and tables; T.Y. and Y.L. performed the data analyses and wrote the manuscript; X.W. and J.C. supervised the preparation of the manuscript. Y.L., X.W., and J.C. reviewed and edited the manuscript. All authors reviewed the manuscript. X.W., J.C., and H.L. helped perform the analysis with constructive discussions.

Corresponding authors

Correspondence to Ji Chen or **ng Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yu, T., Liu, Y., Liu, H. et al. ParaLkResNet: an efficient multi-scale image classification network. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03508-x

Download citation

Accepted: 18 May 2024
Published: 13 June 2024
DOI: https://doi.org/10.1007/s00371-024-03508-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ParaLkResNet: an efficient multi-scale image classification network

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Data and code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ParaLkResNet: an efficient multi-scale image classification network

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Data and code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation