Abstract
With the enhanced capabilities of edge devices in processing images and video streams, novel deep vision applications are rapidly emerging. To support such applications, lightweight neural networks have proven effective, and existing solutions often adjust the model across various dimensions to meet specific application requirements. However, as each application varies in accuracy, latency, and memory usage requirements, a single lightweight technology cannot satisfy all these diverse metrics. Additionally, processing video stream tasks on mobile devices often falls short of achieving real-time performance. Therefore, this paper proposes a lightweight and real-time framework based on neural architecture search, termed LRNAS. We developed a lightweight network search algorithm employing evolutionary strategies and multi-objective optimization to personalize designs for diverse vision tasks. To reduce inference latency further, we designed a video frame filtering strategy. This strategy uses motion vectors and inertial sensors to filter out redundant video frames. We conducted experiments on two public datasets and one custom dataset, demonstrating LRNAS’s effectiveness in enhancing mobile deep vision application performance.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42486-024-00157-w/MediaObjects/42486_2024_157_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42486-024-00157-w/MediaObjects/42486_2024_157_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42486-024-00157-w/MediaObjects/42486_2024_157_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42486-024-00157-w/MediaObjects/42486_2024_157_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42486-024-00157-w/MediaObjects/42486_2024_157_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42486-024-00157-w/MediaObjects/42486_2024_157_Figf_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42486-024-00157-w/MediaObjects/42486_2024_157_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42486-024-00157-w/MediaObjects/42486_2024_157_Fig7_HTML.jpg)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42486-024-00157-w/MediaObjects/42486_2024_157_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42486-024-00157-w/MediaObjects/42486_2024_157_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42486-024-00157-w/MediaObjects/42486_2024_157_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42486-024-00157-w/MediaObjects/42486_2024_157_Fig11_HTML.png)
Similar content being viewed by others
Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author onreasonable request.
References
Baker, B., Gupta, O., Naik, N., Raskar, R.: Designing neural network architectures using reinforcement learning. In: International Conference on Learning Representations (2016)
Chernyshev, M., Baig, Z., Zeadally, S.: Cloud-native application security: Risks, opportunities, and challenges in securing the evolving attack surface. Computer 54(11), 47–57 (2021). https://doi.org/10.1109/MC.2021.3076537
Chu, X., Lu, S., Li, X., Zhang, B.: Mixpath: A unified approach for one-shot neural architecture search. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5949–5958 (2023). https://doi.org/10.1109/ICCV51070.2023.00549
Dong, J.-D., Cheng, A.-C., Juan, D.-C., Wei, W., Sun, M.: Dpp-net: Device-aware progressive search for pareto-optimal neural architectures. In: Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XI, pp. 540–555 (2018). https://doi.org/10.1007/978-3-030-01252-6_32
Ghiasi, G., Lin, T.-Y., Le, Q.V.: Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7029–7038 (2019). https://doi.org/10.1109/CVPR.2019.00720
Gong, T., Zhou, W., Qian, X., Lei, J., Yu, L.: Global contextually guided lightweight network for rgb-thermal urban scene understanding. Eng. Appl. Artif. Intell. 117(PA) (2023) https://doi.org/10.1016/j.engappai.2022.105510
Gowda, S.N., Rohrbach, M., Sevilla-Lara, L.: Smart frame selection for action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 1451–1459 (2021)
Han, D., Kim, J., Kim, J.: Deep pyramidal residual networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6307–6315 (2017). https://doi.org/10.1109/CVPR.2017.668
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
He, Y., Zhang, X., Sun, J.: Channel pruning for accelerating very deep neural networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1398–1406 (2017). https://doi.org/10.1109/ICCV.2017.155
Howard, A., Sandler, M., Chen, B., Wang, W., Chen, L.-C., Tan, M., Chu, G., Vasudevan, V., Zhu, Y., Pang, R., Adam, H., Le, Q.: Searching for mobilenetv3. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1314–1324 (2019). https://doi.org/10.1109/ICCV.2019.00140
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. ar**v preprint ar**v:1704.04861 (2017)
Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2752–2761 (2018)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269 (2017). https://doi.org/10.1109/CVPR.2017.243
Huynh, L.N., Lee, Y., Balan, R.K.: Deepmon: Mobile gpu-based deep learning framework for continuous vision applications. In: Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services, pp. 82–95 (2017). https://doi.org/10.1145/3081333.3081360
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. ar**v preprint ar**v:1602.07360 (2016)
Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., Kalenichenko, D.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2704–2713 (2018). https://doi.org/10.1109/CVPR.2018.00286
Kandaswamy, I., Farkya, S., Daniels, Z., Wal, G., Raghavan, A., Zhang, Y., Hu, J., Lomnitz, M., Isnardi, M., Zhang, D., Piacentino, M.: Real-time hyper-dimensional reconfiguration at the edge using hardware accelerators. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 3609–3617 (2022). https://doi.org/10.1109/CVPRW56347.2022.00405
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791
Li, Z., Li, M., Mohapatra, P., Han, J., Chen, S.: itype: Using eye gaze to enhance ty** privacy. In: IEEE INFOCOM 2017 - IEEE Conference on Computer Communications, pp. 1–9 (2017). https://doi.org/10.1109/INFOCOM.2017.8057233
Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2755–2763 (2017). https://doi.org/10.1109/ICCV.2017.298
Liu, H., Simonyan, K., Vinyals, O., Fernando, C., Kavukcuoglu, K.: Hierarchical representations for efficient architecture search. In: International Conference on Learning Representations (2018)
Liu, H., Simonyan, K., Yang, Y.: Darts: Differentiable architecture search. In: International Conference on Learning Representations (2018)
Liu, Z., Sun, M., Zhou, T., Huang, G., Darrell, T.: Rethinking the value of network pruning. In: International Conference on Learning Representations (2019)
Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L.-J., Fei-Fei, L., Yuille, A., Huang, J., Murphy, K.: Progressive neural architecture search. In: Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part I, pp. 19–35 (2018). https://doi.org/10.1007/978-3-030-01246-5_2
Lopes, V., Carlucci, F.M., Esperança, P.M., Singh, M., Yang, A., Gabillon, V., Xu, H., Chen, Z., Wang, J.: Manas: multi-agent neural architecture search. Mach. Learn. 113(1), 73–96 (2023). https://doi.org/10.1007/s10994-023-06379-w
Lu, Z., Cheng, R., **, Y., Tan, K.C., Deb, K.: Neural architecture search as multiobjective optimization benchmarks: Problem formulation and performance assessment. IEEE Transactions on Evolutionary Computation, 1–1 (2022) https://doi.org/10.1109/TEVC.2022.3233364
Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design, pp. 122–138 (2018). https://doi.org/10.1007/978-3-030-01264-9_8
Mehta, S., Rastegari, M., Shapiro, L., Hajishirzi, H.: Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9182–9192 (2019). https://doi.org/10.1109/CVPR.2019.00941
Pham, H., Guan, M., Zoph, B., Le, Q., Dean, J.: Efficient neural architecture search via parameters sharing. In: International Conference on Machine Learning, pp. 4095–4104 (2018)
Rahman, M., Topkara, U., Carbunar, B.: Movee: Video liveness verification for mobile devices using built-in motion sensors. IEEE Trans. Mob. Comput. 15(5), 1197–1210 (2016). https://doi.org/10.1109/TMC.2015.2456904
Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. In: Proceedings of the Aaai Conference on Artificial Intelligence, vol. 33, pp. 4780–4789 (2019)
Sacco, A., Esposito, F., Marchetto, G.: Resource inference for sustainable and responsive task offloading in challenged edge networks. IEEE Trans. Green Commun. Netw. 5(3), 1114–1127 (2021). https://doi.org/10.1109/TGCN.2021.3091812
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018). https://doi.org/10.1109/CVPR.2018.00474
Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., Le, Q.V.: Mnasnet: Platform-aware neural architecture search for mobile. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2815–2823 (2019). https://doi.org/10.1109/CVPR.2019.00293
Wang, Z., He, X., Zhou, Z., Wang, X., Ma, Q., Miao, X., Liu, Z., Thiele, L., Yang, Z.: Stitching weight-shared deep neural networks for efficient multitask inference on gpu. In: 2022 19th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), pp. 145–153 (2022). https://doi.org/10.1109/SECON55815.2022.9918563
Wang, T., Sun, B., Wang, L., Zheng, X., Jia, W.: Eidls: An edge-intelligence-based distributed learning system over internet of things. IEEE Trans. Syst. Man Cybern. Syst. 53(7), 3966–3978 (2023). https://doi.org/10.1109/TSMC.2023.3240992
Wei, H., Lee, F., Hu, C., Chen, Q.: Moo-dnas: Efficient neural network design via differentiable architecture search based on multi-objective optimization. IEEE Access 10, 14195–14207 (2022). https://doi.org/10.1109/ACCESS.2022.3148323
Wu, Z., **ong, C., Ma, C.-Y., Socher, R., Davis, L.S.: Adaframe: Adaptive frame selection for fast video recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1278–1287 (2019)
**e, L., Yuille, A.: Genetic cnn. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1388–1397 (2017). https://doi.org/10.1109/ICCV.2017.154
Xu, M., Liu, J., Liu, Y., Lin, F.X., Liu, Y., Liu, X.: A first look at deep learning apps on smartphones. In: The World Wide Web Conference, pp. 2125–2136 (2019). https://doi.org/10.1145/3308558.3313591
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018). https://doi.org/10.1109/CVPR.2018.00716
Zhao, M., Yu, Y., Wang, X., Yang, L., Niu, D.: Search-map-search: A frame selection paradigm for action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10627–10636 (2023)
Zhou, Z., Chen, X., Li, E., Zeng, L., Luo, K., Zhang, J.: Edge intelligence: Paving the last mile of artificial intelligence with edge computing. Proc. IEEE 107(8), 1738–1762 (2019). https://doi.org/10.1109/JPROC.2019.2918951
Zhou, Y., Yen, G.G., Yi, Z.: Evolutionary shallowing deep neural networks at block levels. IEEE Trans. Neural Netw Learn Syst 33(9), 4635–4647 (2022). https://doi.org/10.1109/TNNLS.2021.3059529
Zhou, H., Jiang, K., He, S., Min, G., Wu, J.: Distributed deep multi-agent reinforcement learning for cooperative edge caching in internet-of-vehicles. IEEE Trans. Wireless Commun. 22(12), 9595–9609 (2023). https://doi.org/10.1109/TWC.2023.3272348
Zhou, H., Li, M., Wang, N., Min, G., Wu, J.: Accelerating deep learning inference via model parallelism and partial computation offloading. IEEE Trans. Parallel Distrib. Syst. 34(2), 475–488 (2023). https://doi.org/10.1109/TPDS.2022.3222509
Zhou, H., Jiang, K., He, S., Min, G., Wu, J.: Distributed deep multi-agent reinforcement learning for cooperative edge caching in internet-of-vehicles. IEEE Trans. Wireless Commun. 22(12), 9595–9609 (2023). https://doi.org/10.1109/TWC.2023.3272348
Zoph, B., Le, Q.: Neural architecture search with reinforcement learning. In: International Conference on Learning Representations (2016)
Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8697–8710 (2018). https://doi.org/10.1109/CVPR.2018.00907
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, Z., Wang, J., Li, S. et al. A lightweight and real-time responsive framework for various visual tasks via neural architecture search. CCF Trans. Pervasive Comp. Interact. (2024). https://doi.org/10.1007/s42486-024-00157-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42486-024-00157-w