Artificial Intelligence Inference in Edge

  • Chapter
  • First Online:
Edge AI

Abstract

In order to further improve the accuracy, DNNs become deeper and require larger-scale dataset. By this means, dramatic computation costs are introduced. Certainly, the outstanding performance of AI is inseparable from the support of high-level hardware, and it is difficult to deploy them in the edge with limited resources. Therefore, large-scale AI models are generally deployed in the cloud while end devices just send input data to the cloud and then wait for the AI inference results. However, the cloud-only inference limits the ubiquitous deployment of AI services. Specifically, it cannot guarantee the delay requirement of real-time services, e.g., real-time detection with strict latency demands. Moreover, for important data sources, data safety and privacy protection should be addressed. To deal with these issues, AI services tend to resort to edge computing. Therefore, AI models should be further customized to fit in the resource-constrained edge, while carefully treating the trade-off between the inference accuracy and the execution latency of them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 128.39
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 171.19
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
EUR 171.19
Price includes VAT (Germany)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. E. Denton, et al., Exploiting linear structure within convolutional networks for efficient evaluation, in Advances in Neural Information Processing Systems 27 (NeurIPS 2014) (2014), pp. 1269–1277

    Google Scholar 

  2. W. Chen, J. Wilson, S. Tyree, et al., Compressing neural networks with the Hashing Trick, in Proceeding of the 32nd International Conference on International Conference on Machine Learning (ICML 2015) (2015), pp. 2285–2294

    Google Scholar 

  3. C. Szegedy, W. Liu, Y. Jia, et al., Going deeper with convolutions, in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015) (2015), pp. 1–9

    Google Scholar 

  4. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016) (2016), pp. 770–778

    Google Scholar 

  5. Y. Cheng, D. Wang, P. Zhou, T. Zhang, A Survey of Model Compression and Acceleration for Deep Neural Networks (2017). ar**v preprint:1710.09282

    Google Scholar 

  6. S. Han, J. Pool, J. Tran, et al., Learning both weights and connections for efficient neural networks, in Advances in Neural Information Processing Systems 28 (NeurIPS 2015) (2015), pp. 1135–1143

    Google Scholar 

  7. M. Alwani, H. Chen, M. Ferdman, P. Milder, Fused-layer CNN accelerators, in 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2016) (2016), pp. 1–12

    Google Scholar 

  8. M. Courbariaux, Y. Bengio, J.-P. David, BinaryConnect: training deep neural networks with binary weights during propagations, in Advances in Neural Information Processing Systems 28 (NeurIPS 2015) (2015), pp. 3123–3131

    Google Scholar 

  9. M. Rastegari, V. Ordonez, et al., XNOR-Net: ImageNet classification using binary convolutional neural networks, in 2018 European Conference on Computer Vision (ECCV 2016) (2016), pp. 525–542

    Google Scholar 

  10. B. Mcdanel, Embedded binarized neural networks, in Proceeding of the 2017 International Conference on Embedded Wireless Systems and Networks (EWSN 2017) (2017), pp. 168–173

    Google Scholar 

  11. F.N. Iandola, S. Han, M.W. Moskewicz, et al., SqueezeNet: AlexNet-level Accuracy with 50x Fewer Parameters and < 0.5 MB Model Size (2016). ar**v preprint:1602.07360

    Google Scholar 

  12. A.G. Howard, M. Zhu, B. Chen, et al., MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications (2017). ar**v preprint:1704.04861

    Google Scholar 

  13. S.Y. Nikouei, Y. Chen, S. Song, et al., Smart surveillance as an edge network service: from Harr-Cascade, SVM to a lightweight CNN, in IEEE 4th International Conference on Collaboration and Internet Computing (CIC 2018) (2018), pp. 256–265

    Google Scholar 

  14. G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network (2015). ar**v preprint:1503.02531

    Google Scholar 

  15. C. Zhang, Q. Cao, H. Jiang, et al., FFS-VA: a fast filtering system for large-scale video analytics, in Proceeding of the 47th International Conference on Parallel Processing (ICPP 2018) (2018), pp. 1–10

    Google Scholar 

  16. J. Jiang, G. Ananthanarayanan, P. Bodik, S. Sen, I. Stoica, Chameleon: scalable adaptation of video analytics, in Proceeding of the 2018 Conference of the ACM Special Interest Group on Data Communication (SIGCOMM 2018) (2018), pp. 253–266

    Google Scholar 

  17. Fox, Homer simpson. https://simpsons.fandom.com/wiki/File:Homer_Simpson.svg

  18. S.Y. Nikouei, et al., Real-time human detection as an edge service enabled by a lightweight CNN, in 2018 IEEE International Conference on Edge Computing (IEEE EDGE 2018) (2018), pp. 125–129

    Google Scholar 

  19. L. Liu, H. Li, M. Gruteser, Edge assisted real-time object detection for mobile augmented reality, in Proceeding of the 25th Annual International Conference on Mobile Computing and Networking (MobiCom 2019) (2019), pp. 1–16

    Google Scholar 

  20. X. Zhang, X. Zhou, M. Lin, J. Sun, ShuffleNet: an extremely efficient convolutional neural network for mobile devices, in Proceeding of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2018) (2018), pp. 6848–6856

    Google Scholar 

  21. L. Du, et al., A reconfigurable streaming deep convolutional neural network accelerator for Internet of Things. IEEE Trans. Circuits Syst. I Regul. Pap. 65(1), 198–208 (2018)

    Article  Google Scholar 

  22. D. Kang, J. Emmons, F. Abuzaid, P. Bailis, M. Zaharia, NoScope: optimizing neural network queries over video at scale. Proc. VLDB Endow. 10(11), 1586–1597 (2017)

    Article  Google Scholar 

  23. J. Redmon, S. Divvala, et al., You only look once: unified, real-time object detection, in Proceeding of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016) (2016), pp. 779–788

    Google Scholar 

  24. S. Han, Y. Wang, H. Yang, et al., ESE: efficient speech recognition engine with sparse LSTM on FPGA, in Proceeding of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2017) (2017), pp. 75–84

    Google Scholar 

  25. S. Han, H. Mao, W.J. Dally, Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman Coding, in Proceeding of the 6th International Conference on Learning Representations (ICLR 2016) (2016)

    Google Scholar 

  26. S. Bhattacharya, N.D. Lane, Sparsification and separation of deep learning layers for constrained resource inference on wearables, in Proceeding of the 14th ACM Conference on Embedded Network Sensor Systems CD-ROM (SenSys 2016) (2016), pp. 176–189

    Google Scholar 

  27. B. Taylor, V.S. Marco, W. Wolff, et al., Adaptive deep learning model selection on embedded systems, in Proceeding of the 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES 2018) (2018), pp. 31–43

    Google Scholar 

  28. S. Liu, Y. Lin, Z. Zhou, et al., On-demand deep model compression for mobile devices, in Proceeding of the 16th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys 2018) (2018), pp. 389–400

    Google Scholar 

  29. L. Lai, N. Suda, Enabling deep learning at the IoT edge, in Proceeding of the International Conference on Computer-Aided Design (ICCAD 2018) (2018), pp. 1–6

    Google Scholar 

  30. S. Yao, Y. Zhao, A. Zhang, et al., DeepIoT: compressing deep neural network structures for sensing systems with a compressor-critic framework, in Proceeding of the 15th ACM Conference on Embedded Network Sensor Systems (SenSys 2017) (2017), pp. 1–14

    Google Scholar 

  31. S. Han, H. Shen, M. Philipose, et al., MCDNN: an execution framework for deep neural networks on resource-constrained devices, in Proceeding of the 14th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys 2016) (2016), pp. 123–136

    Google Scholar 

  32. S. Han, et al., EIE: efficient inference engine on compressed deep neural network, in ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA 2016) (2016), pp. 243–254

    Google Scholar 

  33. Y. Kang, J. Hauswald, C. Gao, et al., Neurosurgeon: collaborative intelligence between the cloud and mobile edge, in Proceeding of 22nd International Conference Architecture Support Programming Language Operator System (ASPLOS 2017) (2017), pp. 615–629

    Google Scholar 

  34. N.D. Lane, S. Bhattacharya, P. Georgiev, et al., DeepX: a software accelerator for low-power deep learning inference on mobile devices, in 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN 2016) (2016), pp. 1–12

    Google Scholar 

  35. J. Zhang, et al., A Locally Distributed Mobile Computing Framework for DNN based Android Applications, in Proceeding of the Tenth Asia-Pacific Symposium on Internetware (Internetware 2018) (2018), pp. 1–6

    Google Scholar 

  36. Z. Zhao, K.M. Barijough, A. Gerstlauer, DeepThings: distributed adaptive deep learning inference on resource-constrained IoT edge clusters. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37(11), 2348–2359 (2018)

    Article  Google Scholar 

  37. Z. Zhao, Z. Jiang, N. Ling, et al., ECRT: an edge computing system for real-time image-based object tracking, in Proceeding of the 16th ACM Conference on Embedded Networked Sensor Systems (SenSys 2018) (2018), pp. 394–395

    Google Scholar 

  38. H. Li, K. Ota, M. Dong, Learning IoT in edge: deep learning for the internet of things with edge computing. IEEE Netw. 32(1), 96–101 (2018)

    Article  Google Scholar 

  39. G. Li, L. Liu, X. Wang, et al., Auto-tuning neural network quantization framework for collaborative inference between the cloud and edge, in Proceeding of International Conference on Artificial Neural Networks (ICANN 2018) (2018), pp. 402–411

    Google Scholar 

  40. S.S. Ogden, T. Guo, MODI: mobile deep inference made efficient by edge computing, in {USENIX} Workshop on Hot Topics in Edge Computing (HotEdge 2018) (2018)

    Google Scholar 

  41. S. Teerapittayanon, et al., BranchyNet: Fast inference via early exiting from deep neural networks, in Proceeding of the 23rd International Conference on Pattern Recognition (ICPR 2016) (2016), pp. 2464–2469

    Google Scholar 

  42. S. Teerapittayanon, B. McDanel, H.T. Kung, Distributed deep neural networks over the cloud, the edge and end devices, in IEEE 37th International Conference on Distributed Computing Systems (ICDCS 2017) (2017), pp. 328–339

    Google Scholar 

  43. E. Li, Z. Zhou, X. Chen, Edge intelligence: on-demand deep learning model co-inference with device-edge synergy, in Proceeding of the 2018 Workshop on Mobile Edge Communications (MECOMM 2018) (2018), pp. 31–36

    Google Scholar 

  44. L. Li, K. Ota, M. Dong, Deep learning for smart industry: efficient manufacture inspection system with fog computing. IEEE Trans. Ind. Inf. 14(10), 4665–4673 (2018)

    Article  Google Scholar 

  45. U. Drolia, K. Guo, J. Tan, et al., Cachier: edge-caching for recognition applications, in IEEE 37th International Conference on Distributed Computing Systems (ICDCS 2017) (2017), pp. 276–286

    Google Scholar 

  46. L.N. Huynh, Y. Lee, R.K. Balan, DeepMon: mobile GPU-based deep learning framework for continuous vision applications, in Proceeding of the 15th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys 2017) (2017), pp. 82–95

    Google Scholar 

  47. M. Xu, M. Zhu, et al., DeepCache: principled cache for mobile deep vision, in Proceeding of the 24th Annual International Conference on Mobile Computing and Networking (MobiCom 2018) (2018), pp. 129–144

    Google Scholar 

  48. P. Guo, B. Hu, et al., FoggyCache: cross-device approximate computation reuse, in Proceeding of the 24th Annual International Conference on Mobile Computing and Networking (MobiCom 2018) (2018), pp. 19–34

    Google Scholar 

  49. A.H. Jiang, D.L.-K. Wong, C. Canel, L. Tang, I. Misra, M. Kaminsky, M.A. Kozuch, P. Pillai, D.G. Andersen, G.R. Ganger, Mainstream: dynamic stem-sharing for multi-tenant video processing, in Proceeding of the 2018 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC 2018) (2018), pp. 29–41

    Google Scholar 

  50. L. Wang, W. Liu, D. Zhang, Y. Wang, E. Wang, Y. Yang, Cell selection with deep reinforcement learning in sparse mobile crowdsensing, in 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS) (IEEE, New York, 2018), pp. 1543–1546

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Wang, X., Han, Y., Leung, V.C.M., Niyato, D., Yan, X., Chen, X. (2020). Artificial Intelligence Inference in Edge. In: Edge AI. Springer, Singapore. https://doi.org/10.1007/978-981-15-6186-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-6186-3_5

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-6185-6

  • Online ISBN: 978-981-15-6186-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation