Abstract
In order to further improve the accuracy, DNNs become deeper and require larger-scale dataset. By this means, dramatic computation costs are introduced. Certainly, the outstanding performance of AI is inseparable from the support of high-level hardware, and it is difficult to deploy them in the edge with limited resources. Therefore, large-scale AI models are generally deployed in the cloud while end devices just send input data to the cloud and then wait for the AI inference results. However, the cloud-only inference limits the ubiquitous deployment of AI services. Specifically, it cannot guarantee the delay requirement of real-time services, e.g., real-time detection with strict latency demands. Moreover, for important data sources, data safety and privacy protection should be addressed. To deal with these issues, AI services tend to resort to edge computing. Therefore, AI models should be further customized to fit in the resource-constrained edge, while carefully treating the trade-off between the inference accuracy and the execution latency of them.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
E. Denton, et al., Exploiting linear structure within convolutional networks for efficient evaluation, in Advances in Neural Information Processing Systems 27 (NeurIPS 2014) (2014), pp. 1269–1277
W. Chen, J. Wilson, S. Tyree, et al., Compressing neural networks with the Hashing Trick, in Proceeding of the 32nd International Conference on International Conference on Machine Learning (ICML 2015) (2015), pp. 2285–2294
C. Szegedy, W. Liu, Y. Jia, et al., Going deeper with convolutions, in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015) (2015), pp. 1–9
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016) (2016), pp. 770–778
Y. Cheng, D. Wang, P. Zhou, T. Zhang, A Survey of Model Compression and Acceleration for Deep Neural Networks (2017). ar**v preprint:1710.09282
S. Han, J. Pool, J. Tran, et al., Learning both weights and connections for efficient neural networks, in Advances in Neural Information Processing Systems 28 (NeurIPS 2015) (2015), pp. 1135–1143
M. Alwani, H. Chen, M. Ferdman, P. Milder, Fused-layer CNN accelerators, in 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2016) (2016), pp. 1–12
M. Courbariaux, Y. Bengio, J.-P. David, BinaryConnect: training deep neural networks with binary weights during propagations, in Advances in Neural Information Processing Systems 28 (NeurIPS 2015) (2015), pp. 3123–3131
M. Rastegari, V. Ordonez, et al., XNOR-Net: ImageNet classification using binary convolutional neural networks, in 2018 European Conference on Computer Vision (ECCV 2016) (2016), pp. 525–542
B. Mcdanel, Embedded binarized neural networks, in Proceeding of the 2017 International Conference on Embedded Wireless Systems and Networks (EWSN 2017) (2017), pp. 168–173
F.N. Iandola, S. Han, M.W. Moskewicz, et al., SqueezeNet: AlexNet-level Accuracy with 50x Fewer Parameters and < 0.5 MB Model Size (2016). ar**v preprint:1602.07360
A.G. Howard, M. Zhu, B. Chen, et al., MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications (2017). ar**v preprint:1704.04861
S.Y. Nikouei, Y. Chen, S. Song, et al., Smart surveillance as an edge network service: from Harr-Cascade, SVM to a lightweight CNN, in IEEE 4th International Conference on Collaboration and Internet Computing (CIC 2018) (2018), pp. 256–265
G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network (2015). ar**v preprint:1503.02531
C. Zhang, Q. Cao, H. Jiang, et al., FFS-VA: a fast filtering system for large-scale video analytics, in Proceeding of the 47th International Conference on Parallel Processing (ICPP 2018) (2018), pp. 1–10
J. Jiang, G. Ananthanarayanan, P. Bodik, S. Sen, I. Stoica, Chameleon: scalable adaptation of video analytics, in Proceeding of the 2018 Conference of the ACM Special Interest Group on Data Communication (SIGCOMM 2018) (2018), pp. 253–266
Fox, Homer simpson. https://simpsons.fandom.com/wiki/File:Homer_Simpson.svg
S.Y. Nikouei, et al., Real-time human detection as an edge service enabled by a lightweight CNN, in 2018 IEEE International Conference on Edge Computing (IEEE EDGE 2018) (2018), pp. 125–129
L. Liu, H. Li, M. Gruteser, Edge assisted real-time object detection for mobile augmented reality, in Proceeding of the 25th Annual International Conference on Mobile Computing and Networking (MobiCom 2019) (2019), pp. 1–16
X. Zhang, X. Zhou, M. Lin, J. Sun, ShuffleNet: an extremely efficient convolutional neural network for mobile devices, in Proceeding of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2018) (2018), pp. 6848–6856
L. Du, et al., A reconfigurable streaming deep convolutional neural network accelerator for Internet of Things. IEEE Trans. Circuits Syst. I Regul. Pap. 65(1), 198–208 (2018)
D. Kang, J. Emmons, F. Abuzaid, P. Bailis, M. Zaharia, NoScope: optimizing neural network queries over video at scale. Proc. VLDB Endow. 10(11), 1586–1597 (2017)
J. Redmon, S. Divvala, et al., You only look once: unified, real-time object detection, in Proceeding of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016) (2016), pp. 779–788
S. Han, Y. Wang, H. Yang, et al., ESE: efficient speech recognition engine with sparse LSTM on FPGA, in Proceeding of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2017) (2017), pp. 75–84
S. Han, H. Mao, W.J. Dally, Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman Coding, in Proceeding of the 6th International Conference on Learning Representations (ICLR 2016) (2016)
S. Bhattacharya, N.D. Lane, Sparsification and separation of deep learning layers for constrained resource inference on wearables, in Proceeding of the 14th ACM Conference on Embedded Network Sensor Systems CD-ROM (SenSys 2016) (2016), pp. 176–189
B. Taylor, V.S. Marco, W. Wolff, et al., Adaptive deep learning model selection on embedded systems, in Proceeding of the 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES 2018) (2018), pp. 31–43
S. Liu, Y. Lin, Z. Zhou, et al., On-demand deep model compression for mobile devices, in Proceeding of the 16th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys 2018) (2018), pp. 389–400
L. Lai, N. Suda, Enabling deep learning at the IoT edge, in Proceeding of the International Conference on Computer-Aided Design (ICCAD 2018) (2018), pp. 1–6
S. Yao, Y. Zhao, A. Zhang, et al., DeepIoT: compressing deep neural network structures for sensing systems with a compressor-critic framework, in Proceeding of the 15th ACM Conference on Embedded Network Sensor Systems (SenSys 2017) (2017), pp. 1–14
S. Han, H. Shen, M. Philipose, et al., MCDNN: an execution framework for deep neural networks on resource-constrained devices, in Proceeding of the 14th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys 2016) (2016), pp. 123–136
S. Han, et al., EIE: efficient inference engine on compressed deep neural network, in ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA 2016) (2016), pp. 243–254
Y. Kang, J. Hauswald, C. Gao, et al., Neurosurgeon: collaborative intelligence between the cloud and mobile edge, in Proceeding of 22nd International Conference Architecture Support Programming Language Operator System (ASPLOS 2017) (2017), pp. 615–629
N.D. Lane, S. Bhattacharya, P. Georgiev, et al., DeepX: a software accelerator for low-power deep learning inference on mobile devices, in 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN 2016) (2016), pp. 1–12
J. Zhang, et al., A Locally Distributed Mobile Computing Framework for DNN based Android Applications, in Proceeding of the Tenth Asia-Pacific Symposium on Internetware (Internetware 2018) (2018), pp. 1–6
Z. Zhao, K.M. Barijough, A. Gerstlauer, DeepThings: distributed adaptive deep learning inference on resource-constrained IoT edge clusters. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37(11), 2348–2359 (2018)
Z. Zhao, Z. Jiang, N. Ling, et al., ECRT: an edge computing system for real-time image-based object tracking, in Proceeding of the 16th ACM Conference on Embedded Networked Sensor Systems (SenSys 2018) (2018), pp. 394–395
H. Li, K. Ota, M. Dong, Learning IoT in edge: deep learning for the internet of things with edge computing. IEEE Netw. 32(1), 96–101 (2018)
G. Li, L. Liu, X. Wang, et al., Auto-tuning neural network quantization framework for collaborative inference between the cloud and edge, in Proceeding of International Conference on Artificial Neural Networks (ICANN 2018) (2018), pp. 402–411
S.S. Ogden, T. Guo, MODI: mobile deep inference made efficient by edge computing, in {USENIX} Workshop on Hot Topics in Edge Computing (HotEdge 2018) (2018)
S. Teerapittayanon, et al., BranchyNet: Fast inference via early exiting from deep neural networks, in Proceeding of the 23rd International Conference on Pattern Recognition (ICPR 2016) (2016), pp. 2464–2469
S. Teerapittayanon, B. McDanel, H.T. Kung, Distributed deep neural networks over the cloud, the edge and end devices, in IEEE 37th International Conference on Distributed Computing Systems (ICDCS 2017) (2017), pp. 328–339
E. Li, Z. Zhou, X. Chen, Edge intelligence: on-demand deep learning model co-inference with device-edge synergy, in Proceeding of the 2018 Workshop on Mobile Edge Communications (MECOMM 2018) (2018), pp. 31–36
L. Li, K. Ota, M. Dong, Deep learning for smart industry: efficient manufacture inspection system with fog computing. IEEE Trans. Ind. Inf. 14(10), 4665–4673 (2018)
U. Drolia, K. Guo, J. Tan, et al., Cachier: edge-caching for recognition applications, in IEEE 37th International Conference on Distributed Computing Systems (ICDCS 2017) (2017), pp. 276–286
L.N. Huynh, Y. Lee, R.K. Balan, DeepMon: mobile GPU-based deep learning framework for continuous vision applications, in Proceeding of the 15th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys 2017) (2017), pp. 82–95
M. Xu, M. Zhu, et al., DeepCache: principled cache for mobile deep vision, in Proceeding of the 24th Annual International Conference on Mobile Computing and Networking (MobiCom 2018) (2018), pp. 129–144
P. Guo, B. Hu, et al., FoggyCache: cross-device approximate computation reuse, in Proceeding of the 24th Annual International Conference on Mobile Computing and Networking (MobiCom 2018) (2018), pp. 19–34
A.H. Jiang, D.L.-K. Wong, C. Canel, L. Tang, I. Misra, M. Kaminsky, M.A. Kozuch, P. Pillai, D.G. Andersen, G.R. Ganger, Mainstream: dynamic stem-sharing for multi-tenant video processing, in Proceeding of the 2018 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC 2018) (2018), pp. 29–41
L. Wang, W. Liu, D. Zhang, Y. Wang, E. Wang, Y. Yang, Cell selection with deep reinforcement learning in sparse mobile crowdsensing, in 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS) (IEEE, New York, 2018), pp. 1543–1546
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Wang, X., Han, Y., Leung, V.C.M., Niyato, D., Yan, X., Chen, X. (2020). Artificial Intelligence Inference in Edge. In: Edge AI. Springer, Singapore. https://doi.org/10.1007/978-981-15-6186-3_5
Download citation
DOI: https://doi.org/10.1007/978-981-15-6186-3_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6185-6
Online ISBN: 978-981-15-6186-3
eBook Packages: Computer ScienceComputer Science (R0)