Artificial Intelligence Inference in Edge

Wang, **aofei; Han, Yiwen; Leung, Victor C. M.; Niyato, Dusit; Yan, Xueqiang; Chen, Xu

doi:10.1007/978-981-15-6186-3_5

**aofei Wang⁷,
Yiwen Han⁷,
Victor C. M. Leung⁸,
Dusit Niyato⁹,
Xueqiang Yan¹⁰ &
…
Xu Chen¹¹

1847 Accesses

Abstract

In order to further improve the accuracy, DNNs become deeper and require larger-scale dataset. By this means, dramatic computation costs are introduced. Certainly, the outstanding performance of AI is inseparable from the support of high-level hardware, and it is difficult to deploy them in the edge with limited resources. Therefore, large-scale AI models are generally deployed in the cloud while end devices just send input data to the cloud and then wait for the AI inference results. However, the cloud-only inference limits the ubiquitous deployment of AI services. Specifically, it cannot guarantee the delay requirement of real-time services, e.g., real-time detection with strict latency demands. Moreover, for important data sources, data safety and privacy protection should be addressed. To deal with these issues, AI services tend to resort to edge computing. Therefore, AI models should be further customized to fit in the resource-constrained edge, while carefully treating the trade-off between the inference accuracy and the execution latency of them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 128.39; Price includes VAT (Germany)

Softcover Book: EUR 171.19; Price includes VAT (Germany)

Hardcover Book: EUR 171.19; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

E. Denton, et al., Exploiting linear structure within convolutional networks for efficient evaluation, in Advances in Neural Information Processing Systems 27 (NeurIPS 2014) (2014), pp. 1269–1277
Google Scholar
W. Chen, J. Wilson, S. Tyree, et al., Compressing neural networks with the Hashing Trick, in Proceeding of the 32nd International Conference on International Conference on Machine Learning (ICML 2015) (2015), pp. 2285–2294
Google Scholar
C. Szegedy, W. Liu, Y. Jia, et al., Going deeper with convolutions, in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015) (2015), pp. 1–9
Google Scholar
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016) (2016), pp. 770–778
Google Scholar
Y. Cheng, D. Wang, P. Zhou, T. Zhang, A Survey of Model Compression and Acceleration for Deep Neural Networks (2017). ar**v preprint:1710.09282
Google Scholar
S. Han, J. Pool, J. Tran, et al., Learning both weights and connections for efficient neural networks, in Advances in Neural Information Processing Systems 28 (NeurIPS 2015) (2015), pp. 1135–1143
Google Scholar
M. Alwani, H. Chen, M. Ferdman, P. Milder, Fused-layer CNN accelerators, in 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2016) (2016), pp. 1–12
Google Scholar
M. Courbariaux, Y. Bengio, J.-P. David, BinaryConnect: training deep neural networks with binary weights during propagations, in Advances in Neural Information Processing Systems 28 (NeurIPS 2015) (2015), pp. 3123–3131
Google Scholar
M. Rastegari, V. Ordonez, et al., XNOR-Net: ImageNet classification using binary convolutional neural networks, in 2018 European Conference on Computer Vision (ECCV 2016) (2016), pp. 525–542
Google Scholar
B. Mcdanel, Embedded binarized neural networks, in Proceeding of the 2017 International Conference on Embedded Wireless Systems and Networks (EWSN 2017) (2017), pp. 168–173
Google Scholar
F.N. Iandola, S. Han, M.W. Moskewicz, et al., SqueezeNet: AlexNet-level Accuracy with 50x Fewer Parameters and < 0.5 MB Model Size (2016). ar**v preprint:1602.07360
Google Scholar
A.G. Howard, M. Zhu, B. Chen, et al., MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications (2017). ar**v preprint:1704.04861
Google Scholar
S.Y. Nikouei, Y. Chen, S. Song, et al., Smart surveillance as an edge network service: from Harr-Cascade, SVM to a lightweight CNN, in IEEE 4th International Conference on Collaboration and Internet Computing (CIC 2018) (2018), pp. 256–265
Google Scholar
G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network (2015). ar**v preprint:1503.02531
Google Scholar
C. Zhang, Q. Cao, H. Jiang, et al., FFS-VA: a fast filtering system for large-scale video analytics, in Proceeding of the 47th International Conference on Parallel Processing (ICPP 2018) (2018), pp. 1–10
Google Scholar
J. Jiang, G. Ananthanarayanan, P. Bodik, S. Sen, I. Stoica, Chameleon: scalable adaptation of video analytics, in Proceeding of the 2018 Conference of the ACM Special Interest Group on Data Communication (SIGCOMM 2018) (2018), pp. 253–266
Google Scholar
Fox, Homer simpson. https://simpsons.fandom.com/wiki/File:Homer_Simpson.svg
S.Y. Nikouei, et al., Real-time human detection as an edge service enabled by a lightweight CNN, in 2018 IEEE International Conference on Edge Computing (IEEE EDGE 2018) (2018), pp. 125–129
Google Scholar
L. Liu, H. Li, M. Gruteser, Edge assisted real-time object detection for mobile augmented reality, in Proceeding of the 25th Annual International Conference on Mobile Computing and Networking (MobiCom 2019) (2019), pp. 1–16
Google Scholar
X. Zhang, X. Zhou, M. Lin, J. Sun, ShuffleNet: an extremely efficient convolutional neural network for mobile devices, in Proceeding of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2018) (2018), pp. 6848–6856
Google Scholar
L. Du, et al., A reconfigurable streaming deep convolutional neural network accelerator for Internet of Things. IEEE Trans. Circuits Syst. I Regul. Pap. 65(1), 198–208 (2018)
Article Google Scholar
D. Kang, J. Emmons, F. Abuzaid, P. Bailis, M. Zaharia, NoScope: optimizing neural network queries over video at scale. Proc. VLDB Endow. 10(11), 1586–1597 (2017)
Article Google Scholar
J. Redmon, S. Divvala, et al., You only look once: unified, real-time object detection, in Proceeding of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016) (2016), pp. 779–788
Google Scholar
S. Han, Y. Wang, H. Yang, et al., ESE: efficient speech recognition engine with sparse LSTM on FPGA, in Proceeding of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2017) (2017), pp. 75–84
Google Scholar
S. Han, H. Mao, W.J. Dally, Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman Coding, in Proceeding of the 6th International Conference on Learning Representations (ICLR 2016) (2016)
Google Scholar
S. Bhattacharya, N.D. Lane, Sparsification and separation of deep learning layers for constrained resource inference on wearables, in Proceeding of the 14th ACM Conference on Embedded Network Sensor Systems CD-ROM (SenSys 2016) (2016), pp. 176–189
Google Scholar
B. Taylor, V.S. Marco, W. Wolff, et al., Adaptive deep learning model selection on embedded systems, in Proceeding of the 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES 2018) (2018), pp. 31–43
Google Scholar
S. Liu, Y. Lin, Z. Zhou, et al., On-demand deep model compression for mobile devices, in Proceeding of the 16th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys 2018) (2018), pp. 389–400
Google Scholar
L. Lai, N. Suda, Enabling deep learning at the IoT edge, in Proceeding of the International Conference on Computer-Aided Design (ICCAD 2018) (2018), pp. 1–6
Google Scholar
S. Yao, Y. Zhao, A. Zhang, et al., DeepIoT: compressing deep neural network structures for sensing systems with a compressor-critic framework, in Proceeding of the 15th ACM Conference on Embedded Network Sensor Systems (SenSys 2017) (2017), pp. 1–14
Google Scholar
S. Han, H. Shen, M. Philipose, et al., MCDNN: an execution framework for deep neural networks on resource-constrained devices, in Proceeding of the 14th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys 2016) (2016), pp. 123–136
Google Scholar
S. Han, et al., EIE: efficient inference engine on compressed deep neural network, in ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA 2016) (2016), pp. 243–254
Google Scholar
Y. Kang, J. Hauswald, C. Gao, et al., Neurosurgeon: collaborative intelligence between the cloud and mobile edge, in Proceeding of 22nd International Conference Architecture Support Programming Language Operator System (ASPLOS 2017) (2017), pp. 615–629
Google Scholar
N.D. Lane, S. Bhattacharya, P. Georgiev, et al., DeepX: a software accelerator for low-power deep learning inference on mobile devices, in 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN 2016) (2016), pp. 1–12
Google Scholar
J. Zhang, et al., A Locally Distributed Mobile Computing Framework for DNN based Android Applications, in Proceeding of the Tenth Asia-Pacific Symposium on Internetware (Internetware 2018) (2018), pp. 1–6
Google Scholar
Z. Zhao, K.M. Barijough, A. Gerstlauer, DeepThings: distributed adaptive deep learning inference on resource-constrained IoT edge clusters. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37(11), 2348–2359 (2018)
Article Google Scholar
Z. Zhao, Z. Jiang, N. Ling, et al., ECRT: an edge computing system for real-time image-based object tracking, in Proceeding of the 16th ACM Conference on Embedded Networked Sensor Systems (SenSys 2018) (2018), pp. 394–395
Google Scholar
H. Li, K. Ota, M. Dong, Learning IoT in edge: deep learning for the internet of things with edge computing. IEEE Netw. 32(1), 96–101 (2018)
Article Google Scholar
G. Li, L. Liu, X. Wang, et al., Auto-tuning neural network quantization framework for collaborative inference between the cloud and edge, in Proceeding of International Conference on Artificial Neural Networks (ICANN 2018) (2018), pp. 402–411
Google Scholar
S.S. Ogden, T. Guo, MODI: mobile deep inference made efficient by edge computing, in {USENIX} Workshop on Hot Topics in Edge Computing (HotEdge 2018) (2018)
Google Scholar
S. Teerapittayanon, et al., BranchyNet: Fast inference via early exiting from deep neural networks, in Proceeding of the 23rd International Conference on Pattern Recognition (ICPR 2016) (2016), pp. 2464–2469
Google Scholar
S. Teerapittayanon, B. McDanel, H.T. Kung, Distributed deep neural networks over the cloud, the edge and end devices, in IEEE 37th International Conference on Distributed Computing Systems (ICDCS 2017) (2017), pp. 328–339
Google Scholar
E. Li, Z. Zhou, X. Chen, Edge intelligence: on-demand deep learning model co-inference with device-edge synergy, in Proceeding of the 2018 Workshop on Mobile Edge Communications (MECOMM 2018) (2018), pp. 31–36
Google Scholar
L. Li, K. Ota, M. Dong, Deep learning for smart industry: efficient manufacture inspection system with fog computing. IEEE Trans. Ind. Inf. 14(10), 4665–4673 (2018)
Article Google Scholar
U. Drolia, K. Guo, J. Tan, et al., Cachier: edge-caching for recognition applications, in IEEE 37th International Conference on Distributed Computing Systems (ICDCS 2017) (2017), pp. 276–286
Google Scholar
L.N. Huynh, Y. Lee, R.K. Balan, DeepMon: mobile GPU-based deep learning framework for continuous vision applications, in Proceeding of the 15th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys 2017) (2017), pp. 82–95
Google Scholar
M. Xu, M. Zhu, et al., DeepCache: principled cache for mobile deep vision, in Proceeding of the 24th Annual International Conference on Mobile Computing and Networking (MobiCom 2018) (2018), pp. 129–144
Google Scholar
P. Guo, B. Hu, et al., FoggyCache: cross-device approximate computation reuse, in Proceeding of the 24th Annual International Conference on Mobile Computing and Networking (MobiCom 2018) (2018), pp. 19–34
Google Scholar
A.H. Jiang, D.L.-K. Wong, C. Canel, L. Tang, I. Misra, M. Kaminsky, M.A. Kozuch, P. Pillai, D.G. Andersen, G.R. Ganger, Mainstream: dynamic stem-sharing for multi-tenant video processing, in Proceeding of the 2018 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC 2018) (2018), pp. 29–41
Google Scholar
L. Wang, W. Liu, D. Zhang, Y. Wang, E. Wang, Y. Yang, Cell selection with deep reinforcement learning in sparse mobile crowdsensing, in 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS) (IEEE, New York, 2018), pp. 1543–1546
Google Scholar

Download references

Author information

Authors and Affiliations

College of Intelligence and Computing, Tian** University Tian**, Tian**, China
**aofei Wang & Yiwen Han
College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong, China
Victor C. M. Leung
School of Computer and Engineering, Nanyang Technological University, Singapore, Singapore
Dusit Niyato
2012 Lab, Huawei Technologies (China), Shenzhen, China
Xueqiang Yan
School of Data and Computer Science, Sun Yat-sen University, Guangzhou, Guangdong, China
Xu Chen

Authors

**aofei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yiwen Han
View author publications
You can also search for this author in PubMed Google Scholar
Victor C. M. Leung
View author publications
You can also search for this author in PubMed Google Scholar
Dusit Niyato
View author publications
You can also search for this author in PubMed Google Scholar
Xueqiang Yan
View author publications
You can also search for this author in PubMed Google Scholar
Xu Chen
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wang, X., Han, Y., Leung, V.C.M., Niyato, D., Yan, X., Chen, X. (2020). Artificial Intelligence Inference in Edge. In: Edge AI. Springer, Singapore. https://doi.org/10.1007/978-981-15-6186-3_5

Download citation

DOI: https://doi.org/10.1007/978-981-15-6186-3_5
Published: 01 September 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6185-6
Online ISBN: 978-981-15-6186-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics