End-to-end data-dependent routing in multi-path neural networks

Tissera, Dumindu; Wijesinghe, Rukshan; Vithanage, Kasun; Xavier, Alex; Fernando, Subha; Rodrigo, Ranga

doi:10.1007/s00521-023-08381-8

End-to-end data-dependent routing in multi-path neural networks

Original Article
Published: 06 March 2023

Volume 35, pages 12655–12674, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Dumindu Tissera ORCID: orcid.org/0000-0002-7461-0165^1,2,
Rukshan Wijesinghe^1,2,
Kasun Vithanage²,
Alex Xavier²,
Subha Fernando² &
…
Ranga Rodrigo^1,2

194 Accesses
1 Altmetric
Explore all metrics

Abstract

Neural networks are known to give better performance with increased depth due to their ability to learn more abstract features. Although the deepening of networks has been well established, there is still room for efficient feature extraction within a layer, which would reduce the need for mere parameter increment. The conventional widening of networks by having more filters in each layer introduces a quadratic increment of parameters. Having multiple parallel convolutional/dense operations in each layer solves this problem, but without any context-dependent allocation of input among these operations: The parallel computations tend to learn similar features making the widening process less effective. Therefore, we propose the use of multi-path neural networks with data-dependent resource allocation from parallel computations within layers, which also lets an input be routed end-to-end through these parallel paths. To do this, we first introduce a cross-prediction-based algorithm between parallel tensors of subsequent layers. Second, we further reduce the routing overhead by introducing feature-dependent cross-connections between parallel tensors of successive layers. Using image recognition tasks, we show that our multi-path networks show superior performance to existing widening and adaptive feature extraction, even ensembles and deeper networks at similar complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Canada)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 4

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

A survey of FPGA-based accelerators for convolutional neural networks

Article 06 October 2018

Identity Map**s in Deep Residual Networks

Data availability

CIFAR10 and CIFAR100 datasets [58] are available at https://www.cs.toronto.edu/~kriz/cifar.html, and ILSVRC 2012 dataset [1, 21] is available at https://www.image-net.org/challenges/LSVRC/2012/

References

Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis (IJCV) 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 770–778
He K, Zhang X, Ren S, Sun J (2016) Identity map**s in deep residual networks. European conference on computer vision (ECCV). Springer, London, pp 630–645
Google Scholar
Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2015) Fitnets: Hints for thin deep nets. In: proceedings of international conference on learning representations (ICLR)
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. ar**v preprint ar**v:1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 1–9
Zagoruyko S, Komodakis N (2016) Wide residual networks. In: proceedings of the british machine vision conference (BMVC). pp 87–18712
**e S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1492–1500
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: advances in neural information processing systems, pp. 1097–1105
Ciregan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 3642–3649
Wang M (2015) Multi-path convolutional neural networks for complex image classification. ar**v preprint ar**v:1506.04701
Friedman JH (1991) Multivariate adaptive regression splines. Ann Statist 19(1):1–67
MathSciNet MATH Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ (2017) Classification and regression trees. Routledge, Taylor, p 102
Book MATH Google Scholar
Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Computat 3(1):79–87
Article Google Scholar
Jordan MI, Jacobs RA (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Computat 6(2):181–214
Article Google Scholar
Eigen D, Ranzato M, Sutskever I (2013) Learning factored representations in a deep mixture of experts. ar**v preprint ar**v:1312.4314
Shazeer N, Mirhoseini A, Maziarz K, Davis A, Le Q, Hinton G, Dean J (2017) Outrageously large neural networks: the sparsely-gated mixture-of-experts layer. ar**v preprint ar**v:1701.06538
Albawi S, Mohammed TA, Al-Zawi S (2017) Understanding of a convolutional neural network. In: 2017 international conference on engineering and technology (ICET), pp. 1–6
Erhan D, Bengio Y, Courville A, Vincent P (2009) Visualizing higher-layer features of a deep network. Univ Montr 1341(3):1
Google Scholar
Kahatapitiya K, Tissera D, Rodrigo R (2019) Context-aware automatic occlusion removal. In: 2019 IEEE international conference on image processing (ICIP), pp. 1895–1899
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255
Tissera D, Kahatapitiya K, Wijesinghe R, Fernando S, Rodrigo R (2019) Context-aware multipath networks. ar**v preprint ar**v:1907.11519
Tissera D, Vithanae K, Wijesinghe R, Kahatapitiya K, Fernando S, Rodrigo R (2020) Feature-dependant cross-connections in multi-path neural networks. In: international conference on pattern recognition (ICPR), pp 4032–4039
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proceed IEEE 86(11):2278–2324
Article Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533
Article MATH Google Scholar
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI conference on artificial intelligence
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826
Misra I, Shrivastava A, Gupta A, Hebert M (2016) Cross-stitch networks for multi-task learning. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3994–4003
Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75
Article MathSciNet Google Scholar
Thung K-H, Wee C-Y (2018) A brief review on multi-task learning. Multimed Tools Appl 77(22):29705–29725
Article Google Scholar
Crawshaw M (2020) Multi-task learning with deep neural networks: a survey. ar**v preprint ar**v:2009.09796
Ruder S, Bingel J, Augenstein I, Søgaard A (2019) Latent multi-task architecture learning. In: proceedings of AAAI conference of artificial intelligence. pp 4822–4829
Gao Y, Ma J, Zhao M, Liu W, Yuille AL (2019) Nddr-cnn: Layerwise feature fusing in multi-task cnns by neural discriminative dimensionality reduction. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 3205–3214
Ha D, Dai A, Le QV (2017) Hypernetworks. In: proceedings of international conference on learning representations (ICLR)
Cai S, Shu Y, Wang W (2021) Dynamic routing networks. In: proceedings of the IEEE/CVF winter conference on applications of computer vision. pp 3588–3597
Hinton GE, Sabour S, Frosst N (2018) Matrix capsules with EM routing. In: proceedings of international conference on learning representations (iclr)
Hu J, Shen L, Albanie S, Sun G, Vedaldi A (2018) Gather-excite: Exploiting feature context in convolutional neural networks. In: advances in neural information processing systems. pp 9401–9411
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 7132–7141
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: advances in neural information processing systems, pp. 3856–3866
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2019) Eca-net: efficient channel attention for deep convolutional neural networks. ar**v preprint ar**v:1910.03151
Veit A, Belongie S (2018) Convolutional networks with adaptive inference graphs. In: European conference on computer vision. pp 3–18
Wu Z, Nagarajan T, Kumar A, Rennie S, Davis LS, Grauman K, Feris R (2018) Blockdrop: Dynamic inference paths in residual networks. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 8817–8826
Srivastava RK, Greff K, Schmidhuber J (2015) Highway networks. ar**v preprint ar**v:1505.00387
Rao Y, Lu J, Lin J, Zhou J (2018) Runtime network routing for efficient image classification. IEEE Trans Patt Anal Mach Intell 41(10):2291–2304
Article Google Scholar
Wang X, Yu F, Dou ZY, Darrell T, Gonzalez JE (2018) Skipnet: Learning dynamic routing in convolutional networks. In: proceedings of the European conference on computer vision (ECCV), pp. 409–424
Chen B, Zhao T, Liu J, Lin L (2021) Multipath feature recalibration densenet for image classification. Int J Mach Learn Cybernet 12(3):651–660
Article Google Scholar
Zhang H, Wu C, Zhang Z, Zhu Y, Lin H, Zhang Z, Sun Y, He T, Mueller J, Manmatha R, et al. (2022) Resnest: split-attention networks. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2736–2746
Yu K, Wang X, Dong C, Tang X, Loy CC (2021) Path-restore: learning network path selection for image restoration. IEEE Trans Patt Anal Mach Intell 44(10):7078–7092
Article Google Scholar
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: proceedings of the IEEE conference on computer vision and pattern recognition. pp 4700–4708
Srivastava RK, Greff K, Schmidhuber J (2015) Training very deep networks. Adv Neural Inf Process Syst 28:2377–2385
Google Scholar
Fedus W, Dean J, Zoph B (2022) A review of sparse expert models in deep learning. ar**v preprint ar**v:2209.01667
Chen Z, Deng Y, Wu Y, Gu Q, Li Y (2022) Towards understanding mixture of experts in deep learning. ar**v preprint ar**v:2208.02813
Lepikhin D, Lee H, Xu Y, Chen D, Firat O, Huang Y, Krikun M, Shazeer N, Chen Z (2020) Gshard: Scaling giant models with conditional computation and automatic sharding. ar**v preprint ar**v:2006.16668
Fedus W, Zoph B, Shazeer N (2021) Switch transformers: scaling to trillion parameter models with simple and efficient sparsity. J Mach Learn Res 23:1–40
MathSciNet Google Scholar
Riquelme C, Puigcerver J, Mustafa B, Neumann M, Jenatton R, Susano Pinto A, Keysers D, Houlsby N (2021) Scaling vision with sparse mixture of experts. Adv Neural Inf Process Syst 34:8583–8595
Google Scholar
Wu L, Liu M, Chen Y, Chen D, Dai X, Yuan L (2022) Residual mixture of experts. ar**v preprint ar**v:2204.09636
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: proceedings of the fourteenth international conference on artificial intelligence and statistics. pp 315–323
Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images. Technical report, Citeseer
Google Scholar
Ha D, Dai A, Le QV (2016) Hypernetworks. ar**v preprint ar**v:1609.09106
Facebook: fb.resnet.torch. Github. https://github.com/facebookarchive/fb.resnet.torch
Simonyan K, Vedaldi A, Zisserman A (2013) Deep inside convolutional networks: visualising image classification models and saliency maps. ar**v preprint ar**v:1312.6034

Download references

Funding

This research is funded by CODEGEN International (Pvt) Ltd, Sri Lanka.

Author information

Authors and Affiliations

Department of Electronics and Telecommunication Engineering, University of Moratuwa, Moratuwa, Sri Lanka
Dumindu Tissera, Rukshan Wijesinghe & Ranga Rodrigo
CodeGen QBITS Lab, University of Moratuwa, Moratuwa, Sri Lanka
Dumindu Tissera, Rukshan Wijesinghe, Kasun Vithanage, Alex Xavier, Subha Fernando & Ranga Rodrigo

Authors

Dumindu Tissera
View author publications
You can also search for this author in PubMed Google Scholar
Rukshan Wijesinghe
View author publications
You can also search for this author in PubMed Google Scholar
Kasun Vithanage
View author publications
You can also search for this author in PubMed Google Scholar
Alex Xavier
View author publications
You can also search for this author in PubMed Google Scholar
Subha Fernando
View author publications
You can also search for this author in PubMed Google Scholar
Ranga Rodrigo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dumindu Tissera.

Ethics declarations

Competing Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tissera, D., Wijesinghe, R., Vithanage, K. et al. End-to-end data-dependent routing in multi-path neural networks. Neural Comput & Applic 35, 12655–12674 (2023). https://doi.org/10.1007/s00521-023-08381-8

Download citation

Received: 17 May 2022
Accepted: 13 February 2023
Published: 06 March 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s00521-023-08381-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Canada)

Instant access to the full article PDF.

Institutional subscriptions

End-to-end data-dependent routing in multi-path neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

A survey of FPGA-based accelerators for convolutional neural networks

Identity Map**s in Deep Residual Networks

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

End-to-end data-dependent routing in multi-path neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

A survey of FPGA-based accelerators for convolutional neural networks

Identity Map**s in Deep Residual Networks

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation